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FOREWORD 


In  a  conversation  with  Roger  MacGowan,  Dr.  John  H.  Giese,  Chairman 
of  the  Numerical  Analysis  Conference,  raised  the  question  of  holding  the 
1971  Army  Numerical  Analysis  Conference  at  the  Department  of  Defense 
Computer  Institute,  Washington  Navy  Yard,  Washington,  D.  C.  Mr.  MacGowan, 
formerly  with  the  Army  Missile  Command  at  Redstone  Arsenal,  is  now  with 
D0DC1.  He  has  been  an  active  participant  in  this  series  of  conferences 
and  was  willing  to  discuss  the  possibility  of  his  installation  serving  as  host 
for  the  1971  Army  Numerical  Analysis  Conference  with  CPT  George  P.  Sotos 
USN,  Director  of  DODCI,  The  facilities  at  the  Washington  Navy  Yard  were 
exceptionally  fine  for  holding  a  conference  of  this  size,  and  those  in 
attendance  are  indebted  to  CPT  Sotos  for  agreeing  to  host  it  at  his  instal¬ 
lation.  They  are  also  indebted  to  Roger  MacGowan  for  serving  as  Chairman 
on  Local  Arrangements.  lie  and  LTC  F.  D.  Troyan  did  an  outstanding  job  in 
handling  the  many  problems  that  arose  during  the  conduction  of  the  conference. 

The  following  information  taken  from  a  folder  issued  by  DODCI  will 
serve  to  acquaint  interested  individuals  in  the  services  offered  by  the 
host  of  this  conference.  "The  Department  of  Defense  Computer  Institute  is 
a  jointly  staffed  activity  established  by  the  Secretary  of  Defense  under 
the  Executive  Agency  of  the  Secretary  of  the  Navy.  The  Institute  functions 
under  the  sponsorship  of  the  Chief  of  Naval  Operations.  The  Department  of 
Defense  Computer  Institute  provides  computer  orientation  courses  which  are 
designed  to  acquaint  SENIOR  MILITARY  AND  CIVILIAN  DOD  EXECUTIVES  with  the 
application,  operation  and  selection  of  digital  computer  systems.  The 
courses  provide  a  comprehensive  view  of  the  computer  field,  and  are  directed 
to : 

1.  Teaching  the  fundamentals  of  digital  computer  capabilities, 
applications,  and  limitation. 

2.  Planning  and  implementation  of  new  digital  computer  systems  and 
improving  existing  systems. 

3.  Enabling  DOD  to  plan  and  operate  its  systems  more  independently 
of  contractors. 

Requests  for  information  should  be  addressed  to: 

Director 

Department  of  Defense  Computer  Institute 
Bldg,  175 

Washington  Navy  Yard 
Washington,  D.  C.  20390." 
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The  theme  of  the  invited  addresses  was  Computer  Aided  Design  and 
Engineering.  Dr.  James  H.  Griesmer  with  International  Business  Machines 
Corporation  was  the  lead-off  speaker.  His  address  was  entitled  "Scratchpad/ 
1:  An  Interactive  Facility  for  Symbolic  Mathematics."  Professor  J.  C*  R. 
Licklider,  Director  of  Project  MAC,  spoke  on  "Modern  Facilities  for  Man 
Computer  Interaction."  Professor  A.  Van  Dam,  Brown  University,  concluded 
the  conference  with  a  talk  on  "Low  Cost  Interactive  Computer  Graphics." 
Besides  these  invited  addresses,  there  were  fourteen  informative  contri¬ 
buted  papers. 

The  Army  Mathematics  Steering  Committee  sponsors  these  conferences 
on  behalf  of  the  Office  of  the  Chief  of  Research  and  Development.  Members 
of  this  committee  have  asked  that  most  of  the  papers  presented  at  this 
symposium  be  printed  in  these  Proceedings  in  order  that  persons  unable 
to  attend  the  meeting  may  become  acquainted  with  their  informative  contents. 
They  would  like  to  thank  Dr.  John  H.  Giese  and  members  of  his  Program 
Committee  for  organizing  this  1971  Army  Numerical  Analysis  Conference.  They 
appreciate  very  much  the  time  and  efforts  of  the  many  speakers  and  chairmen 
who  really  made  this  conference  such  an  interesting  and  scientific  event. 
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SCRATCHPAD/1  AN  INTERACTIVE  FACILITY 
FOR  SYMBOLIC  MATHEMATICS 

★ 

James  H.  Griesmer 

IBM  Thomas  J.  Watson  Research  Center 
Yorktown  Heights,  New  York 

and 

University  of  California 
Berkeley,  California 

1.  INTRODUCTION.  In  the  past  few  years  considerable  interest 
has  been  shown  in  the  use  of  computers  to  carry  out  symbolic  mathe¬ 
matical  computations.  A  recent  conference,  the  Second  Symposium  on 
Symbolic  and  Algebraic  Manipulation  <  9  > ,  contained  nearly  50  papers 
on  the  subject,  including  several  papers  on  computer  systems  which 
provide  facilities  for  algebraic  computation.  SCRATCHPAD/1  is  one 
such  system,  distinguished  both  by  its  powerful  symbolic  capability, 
and  by  its  user  language  containing  notations  which  resemble  those 

of  conventional  mathematics. 

SCRATCHPAD/1  has  been  implemented  in  the  LISP  programming 
language  using  an  experimental  System/360  LISP  system.  The  principal 
features  of  this  LISP  system  which  enhance  its  capability  for  symbolic 
and  algebraic  computations  are  provisions  for  unlimited  precision 
integer  arithmetic  and  for  accessing  a  sizable  number  of  LISP  programs. 

This  latter  capability  has  enabled  other  symbolic  systems  written 
in  LISP  to  be  incorporated  into  the  SCRATCHPAD  library.  Significant 
portions  of  the  following  systems  are  simultaneously  available  to 
t..c  SCRATCHPAD  user :  REDUCE2  <•  4  >  ,  MATKLAB  <  2  >  ,  SIN  <  8  >  ,  and 
Korsvold's  On-Line  Simplification  System  <  6  >. 

An  Interactive  version  of  SCRATCHPAD  is  available  to  users  on  a 
S/360  Model  67  under  the  CP/CMS  time-sharing  system;  a  batch  version 
has  been  implemented  on  a  S/360  Model  91  under  OS/360. 

2.  SYSTEM  ORGANIZATION  AND  CAPABILITIES.  SCRATCHPAD  may  be 
viewed  as  having  four  components: 

an  input  translator  to  convert  the  input  strings  (commands)  from 

the  user  into  a  form  suitable  for  interpretation; 

an  output  translator  to  convert  expressions  from  an  internal 

form  to  a  two-dimensional  format  for  output  to  the  user; 
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a  library  containing  the  bulk  of  the  algebraic  manipulation 
facilities;  and 

an  evaluator  which  causes  the  commands  issued  by  the  user  to 
be  carried  out. 

(i)  The  input  translator.  User  commands  in  the  source  language 
of  SCRATCHPAD  are  accepted  by  the  input  translator  and  converted  into 
a  form  suitable  for  interpretation  by  the  evaluator.  The  current 
mode  of  input  for  interactive  use  of  the  system  is  via  an  IBM  2741 
communications  terminal. 

The  SCRATCHPAD  language  allows  mathematical  entities  with 
two-dimensional  graphics;  e.q,, 

i  - n  2 

x  (t)  y  /  j 

i  1  U*  - j'l 

For  keyboard  input  all  two-dimensional  forms  are  linearized  in  a 
straight-forward  manner;  for  example,  the  above  entities  are 
linearized  as  follows: 

x<  i  >  (t)  y<j,k;i;;l>  sum  <j=l;n>j**2 

(ii)  The  output  translator.  A  modification  of  the  CHARYBDIS 
program  from  MATHLAB  <  7  > ,  is  used  in  SCRATCHPAD  to  produce  a  two- 
dimensional  image  of  a  mathematical  expression.  CHARYBDIS  output 
may  be  obtained  either  from  a  2741  terminal  or  from  any  other  line 
printing  output  device. 

(iii)  The  library  of  SCRATCHPAD.  The  following  symbolic 
capabilities,  most  of  which  were  originally  written  for  other 
systems,  are  currently  available  in  SCRATCHPAD/1: 

Simplication  (R,M,K,S,N) 

Polynomial  Greatest  Common  Divisor  (R) 

Differentiation  (R,N) 

Integration  (M,S) 

Polynomial  Factorization  (M) 

Direct  and  Inverse  Laplace  Transforms  (M,K) 

Matrix  Operations  (R,N) 

Solutions  of  Systems  of  Linear  Equations  (N) 

The  letter  "R,M,K,S"  refer  respectively,  to  REDUCE2,  MATHLAB, 
Korsvold,  and  SIN,  The  letter  nN"  refers  to  newly  created  facilites. 
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The  following  is  indicative  of  the  variety  of  problems  for  which  the 
library  has  been  utilized: 

(a)  generation  and  study  of  polynomials  arising  in  graph  theory 

and  sorting;  j 

(b)  inversion  of  transition  matrices  resulting  from  a  problem 
in  data  compression; 

.(c)  symbolic  differentiation  and  substitution  required  in  a 
study  of  wave  propagation  in  an  elastic  media; 

(d)  symbolic  triple  integrations  arising  in  queuing  theory; 

(e)  solution  of  an  eight  dimensional  system  of  linear  symbolic 
equations  arising  from  research  on  optimal  difference  formulae; 

(f)  investigation  of  subdeterminants  derived  from  transfor¬ 
mations  applied  to  systems  of  nonlinear  ordinary  differential  equations 

(iv)  The SCRATCHPAD  evaluator.  Evaluation  consists  of  a 
systematic  transformation  of  an  entity  such  as  a  mathematical  expres¬ 
sion,  as  governed  by  the  current  "environment",  that  is,  the  set  of 
all  definitions,  rules,  and  flag  settings  in  effect  at  a  particular 
instant  in  time.  The  SCRATCHPAD  system  provides  the  user  with  con¬ 
siderable  control  over  the  evaluation  process,  by  allowing  him  to 
introduce  new  substitution  rules  and  pattern-matching  rules,  and  to 
modify  system  flags  and  variables. 

3.  THE  SCRATCHPAD  LANGUAGE.  The  SCRATCHPAD  language  is  designed 
primarily  for  interactive  use  by  a  mathematician  unskilled  as  a 
programmer.  It  features  a  syntax  which  is  simple  and  concise,  yet 
rich  in  mathematical  constructs.  Its  design  currently  provides  a 
framework  for  manipulation  of  all  of  the  following: 

(a)  finite  and  infinite  sums,  products,  and  sequences; 

(b)  relations  such  as  equations  and  inequalities; 

(c)  arbitrarily  indexed  variables,  functions,  and  operators; 

(d)  sets; 

(e)  arrays  of  arbitrary  dimension. 

In  this  paper  we  shall  mention  some  of  the  essential  features  of 
the  SCRATCHPAD  language.  A  more  complete  description  of  the  user 
language  is  contained  in  <  3  > . 


In  order  to  illustrate  some  of  the  features  of  the  SCRATCHPAD 
language,  we  shall  consider  the  calculation  of  Legendre  polynomials 
using  the  recurrence  relation: 


P 

n 


2n-l 

n 


x*p 

n-1 


n-1 

n 


P 

n-2 


subject  to  the  initial  conditions 


P  =  1.  p  =  x. 

0  1 

To  enter  this  recursive  definition  into  the  system,  the  SCRATCHPAD 
user  need  merely  type  the  following  four  STATEMENTS,  issued  as  indi¬ 
vidual  COMMANDS,  using  the  above  rules  for  linearization: 

p  <  0  >  *  1 

p  <  1  >  =  x 
n  in  (2,3,...) 

p  <  n  >  *  ((2*n-l)  *  x  *  p  <  n-l>  -  (n-1)  *  p  <  n-2  > )/n 

To  get  the  first  5  Legendre  polynomials  as  output,  the  user  types 
the  COMMAND: 

p n  >  f  n  in  (0,1,..., A) 

and  the  system  responds  with 

P  :  1 

0 

p  :  X 

2 

3X^  -  1 


and  so  on. 

The  above  example  illustrates  several  aspects  of  the  SCRATCHPAD 
language  besides  its  closeness  to  conventional  mathematical  notation. 
STATEMENTS  are  the  fundamental  constructs  for  making  definitions  and 
declarations.  They  consist  of: 


* 


l 


4 


(i)  a  left  part:  usually  a  VARIABLE  or  a  FORM  (e.g., 
x  <  i  >  ,  or  f (x)); 

(li)  a  relator:  >,  >  =  ,  =  ,«=,<,  in; 

(iii)  a  right  part:  an  EXPRESSION. 

VARIABLES  and  FORMS,  together  with  (unlimited  precision) 

INTEGERS,  VECTORS,  and  SETs  make  up  the  PRIMITIVES  of  the  language. 
PRIMITIVES  combine  with  infix  and  prefix  operators  to  form  EXPRESSIONS. 

The  ellipsis  (...)  may  be  used  in  sums,  products,  and  sequences 
to  indicate  missing  terms;  e.g., 

1  +  2  +  ...  +  n  j  in  (1,2,...) 

STATEMENTS  may  be  issued  single  as  COMMANDS: 

j  in  (1,2,...) 

in  which  case  they  are  in  effect  during  the  evaluation  of  all  sub¬ 
sequent  COMMANDS.  On  the  other  hand,  several  STATEMENTS  may  be 
issued  as  a  single  COMMAND: 

f(x)  =  x**j,  x  >  0,  j  in  (1,2,...) 

Here  the  STATEMENTS  on  the  right  affect  only  the  environment  for 
evaluation  of  STATEMENTS  to  their  left.  As  suggested  by  the  examples 
above,  STATEMENTS  are  used  to  assign  a  value,  or  more  generally  a 
range  of  possible  values  to  a  VARIABLE  or  FORM. 

The  SCRATCHPAD  notations  for  VECTORS  has  been  derived  from 
SYMBAL  <  1  >.  The  elements  of  a  VECTOR  can  be  labeled: 

(1:  1  +  x,  2:  1  +  2*x,  3:  1  +  3*x,  4:  1  +  4*x) 

or  unlabeled: 

(1  +  x,  1  +  2*x,  1  +  2*x,  1  +  4*x) 

Ellipses  can  be  used  in  a  variety  of  ways;  e.g.,  to  indicate  an 
infinite  VECTOR  or  sequence: 

(1,3,5,...) 
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Matrices  are  viewed  as  VECTORS  of  (row)  VECTORs: 


a  2  x  2  matrix: 

the  n-dimensional 
identity  matrix 


( (a,b) ,  (c,d) ) 

(i:  (j:  i  -  j)), 

(i» j )  in  (1, 2, . . , ,n) 


4.  USER  CONTROL.  After  a  COMMAND  is  typed  into  the  system,  its 
constituent  STATEMENTS  are  evaluated  from  right  to  left.  Evaluation 
of  EXPRESS IONSs  in  STATEMENTS  consists  of  several  phases,  including 
simplification,  substitution,  expansion,  pattern  matching,  and  expres¬ 
sion  restructuring. 


SCRATCHPAD  provides  the  user  with  a  wide  variety  of  controls 
over  the  evaluation  process  in  the  form  of  flags,  special  variables, 
and  selection  capabilities.  For  example,  by  controlling  the  value 
of  the  special  variable  EXP,  he  may  control  whether  EXPRESSIONS 
such  as 


(x+5)  *  (x-3)  or  (x+4)  **  10 
will  be  expanded  in  full. 

Other  special  variables  allow  the  user  to  control  the  formatting 
of  expressions;  e.g.,  the  STATEMENT 

ORDER  =  (z,y) 

indicates  that  '  z'  is  to  occur  before  fy*  in  the  output  of  poly¬ 
nomial  products.  Also  the  variable  FACTORS  can  be  set  by  the  user  to 
a  list  of  those  variables  which  are  to  be  factored  out  of  expressions 
on  output. 

If  the  user  wishes  to  have  previous  results  saved  on  a  "work- 
file"  on  secondary  storage,  he  can  set  the  value  of  the  special 
variable  HISTORY  to  1.  Any  previously  computed  result  can  then  be 
recalled  by  referring  to  its  integer  label. 

Another  aspect  of  the  user  control  is  the  availability  of  the 
WHERE-clause  to  qualify  an  EXPRESSION.  For  example, 

...  +  (q(x)  where  x=b)  +  ... 

causes  q(x)  to  be  evaluated  at  the  point  x=b,  regardless  of  any 
previous  value  that:  x  was  assigned. 
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Other  facilities  enable  the  user  to  access  sub-expressions  or 
coefficients  of  a  variable  within  an  EXPRESSION,  e.g., 

coeff  (x,3,x**4-7*x**3+5) 

evaluates  to  -7,  the  coefficient  of  x  raised  to  the  third  power. 

5.  EXTENSION  FACILITIES.  A  number  of  ways  of  extending 
SCRATCHPAD  are  available.  Of  these  we  briefly  mention  here  the 
facilities  provided  for  user-defined  procedures  and  syntax  extension. 

To  give  the  user  the  capability  of  executing  a  block  of  COMMANDS 
over  and  over,  a  facility  for  creating  procedures  is  provided.  Any 
COMMAND,  preceded  by  a  label  of  the  form  "n,m"  for  n  and  m  integers, 

e.g., 

1.40  s=s+b*(y  where  x  =  a) 

is  neither  translated  nor  interpreted,  but  rather  stored  by  the 
system  for  later  reference.  The  integer  "n"  is  the  procedure  number, 
and  "m"  the  COMMAND  number.  The  user  may  type  the  constituent 
COMMANDS  into  SCRATCHPAD  in  any  order.  When  all  COMMANDS  have  been 
entered  the  user  issues  a  COMMAND  to  create  the  procedure  for  sub¬ 
sequent  interpretation. 

As  an  example,  using  the  above  recurrent  relation  for  defining 
Legendre  polynomials,  a  procedure  for  obtaining  the  Legendre  poly¬ 
nomial  of  degree  n,  p  <n  can  be  written  as  follows: 


1.10 

p<  0  >  *  1 

1.20 

p  <  1  >  =  X 

1.30 

m  =  2 

1.40 

return  p<n>  if  n  >  n 

1.50 

p <  m  >  =  ((2*m-l)  *  x  *  p<  m-1  > 

-  (m-1)  *  p  <  m-2  > )/m 

1.60 

m=m+l 

1.70 

go  40 

together 

with 

legendre 

(n) 

=  procedure  (1),  m  local 

Then  the 

COMMAND 

legendre 

(6) 

will  cause  p<  6  >  to  be  calculated  and  typed  out, 

COMMAND  steps  in  procedures  may  be  inserted,  replaced  or  deleted. 
In  the  above  procedure,  if  the  user  also  wished  to  have  p<m  >, 
m  <  =  n,  typed  out,  he  need  merely  add  the  COMMAND 
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1.55  p  <  m  > 

followed  again  by: 

legendre  (n)  =  procedure  (1),  m  local 
The  added  COMMAND  would  be  inserted  between  COMMANDS  50  and  60. 

An  important  objective  of  any  system  which  hopes  to  answer  the 
needs  of  mathematicians  working  in  diverse  areas  is  to  enable  a  user 
to  introduce  and  use  the  specialized  notations  from  his  problem  area. 

To  accomplish  this  objective,  a  syntax  extension  facility  has  been 
provided  in  SCRATCHPAD.  This  facility  enables  a  user  to  extend  the 
base  language  in  a  convenient  way.  For  example,  the  notation  in  the 
base  language  for  the  absolute  value  of  an  EXPRESSION  x  is  '  absval  (x) * . 
If  the  user  wishes  to  use  the  notation  |x|  for  absolute  value,  then 
he  may  issue  the  COMMAND: 

"|x|"  =  "absval  (x)",  x  expression 
The  general  form  of  a  syntax  extension  COMMAND  is 

"N"  =  "D",  <  qualifier  >,  <  qualifier  >,  ... 

where  N  is  a  new  notation,  and  D  is  its  definition  in  terms  of  known 
constructs  in  the  base  language  plus  all  extensions  to  date.  Further 
details  on  the  syntax  extension  feature  are  contained  in  <  5  > . 
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EXPEDITING  LARGE  SCALE  NUMERICAL  COMPUTATIONS 


BY  THE  USE  OF  TV  TECHNIQUES 

JENNY  BRAMLEY 

Computer  Sciences  Laboratory 
US  Army  Engineer  Topographic  Laboratories 
Fort  Belvoir,  Virginia  22060 

This  is  a  preliminary  report  limited  to  the  presentation  of  the 
fundamentals  involved  in  implementing  the  use  of  TV  techniques  for 
carrying  out  numerical  computations.  The  completed  work  will  be 
published  in  a  periodical  primarily  devoted  to  computers.  The 
method  described  is  mainly  intended  for  the  correlation  and  pro¬ 
cessing  of  pictorial  information.  Thus  it  operates  on  experimental 
inputs  having  as  a  rule  an  accuracy  of  1-2%. 

The  search  for  an  alternative  to  the  ubiquitous  digital  com¬ 
puter  was  motivated  by  the  delays  inherent  in  the  input  and  out¬ 
put  functions,  which  offset  the  high  speed  of  digital  computaion 
and  result  in  high  cost  of  operation.  The  delays  are  further  com¬ 
pounded  by  the  need  for  piecewise  operation  since  storage  of 
intensity  data  pertaining  to  even  a  relatively  small  picture,  e.g., 
Mr  picture  elements  quantized  to  64  gray  levels,  cannot  be  accom¬ 
plished  cost-effectively  in  an  inner  computer  memory. 

The  alternative  selected  is  based  on  the  use  of  a  cathode  ray 
tube  (CRT)  in  conjunction  with  equipment  for  light  flux  measurement 


Preceding  page  Plank 
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to  perform  fully  automated  multi-dimensional  mensuration.  The 
method  will  be  explained  below  on  the  evaluation  of  the  correla¬ 
tion  integral 

a+L 

Ks)  *  J  f (x)g(x-s)dx. 

The  analog  system  needed  for  this  purpose  includes  the  follow¬ 
ing  components:  ' 

1.  A  TV  monitor  for  about  1,000  line  presentation  with 
a  CRT  having  a  screen  of  very  short  persistence. 

2.  A  detector,  which  integrates  the  light  flux  received 
during  one  frame  period  T,  together  with  the  associated  optical  sys¬ 
tem  to  focus  the  light  output  of  the  CRT  on  it. 

3.  Small  scale  video  storage,  such  as  a  video  disc. 

4.  Means  for  the  utilization  of  the  results  obtained. 
These  would  either  be  uisplays  for  direct  viewing,  or  the  output 
could  be  digitized  and  fed  into  a  digital  computer.  Still  another 

possibility  of  special  interest  in  correlation  work  is  to  use  the 

• , 

output  as  a  control  for  further  processing. 

The  use  of  TV  techniques  imposes  a  number  of  constraints  on  the 
variables:  All  quantities  must  be  single-valued,  positive  and 
normalized  to  fall  within  specified  ranges.  The  first  requirement 
is  basic,  the  second  one  can  be  satisfied  by  adding  a  bias  con¬ 
stant,  which  is  then  subtracted  from  the  final  result,  while  the 
third  one  merely  calls  for  suitable  units. 
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All  the  mensuration  operations  are  performed  within  the  CRT 
raster  ARCD,  a  square  of  side  L,  as  shown  in  ^i^ure  1.  The  vari- 


PIGURF  1 

able  x  alonp,  the  line  AB  can  take  oh  all  integral  values  from  1  to 
w,  the  total  number  of  raster  lines.  (In  mathematical  terms*  a 
scan  or  raster  "line"  is  a  narrow  band  comprised  between  two 
straight  lines.)  A  suitable  value  is  N'vlOOO,  readily  available  in 
commercial  high  resolution  TV  systems. 

To  plot  a  function  f(x),  adjust  the  scale  so  that  the  numeri¬ 
cal  value  of  f (x)  falls  between  0  and  L.  It  is  not  restricted  to 
integral  multiples  of  L/N  (corresponding  to  N  resolution  points). 


The  plotting  is  performed  by  scanning  with  the  electron  beam  along 
all  successive  horizontal  scan  lines  numbered  1, . . .  ,n , . . .  ,N.  Along 
any  line  n,  the  beam  current  is  cut  off  at  a  vertical  location 
whose  distance  from  AB  is  f(n).  This  functional  representation  is 
rotated  by  90°  from  conventional  mathematical  plots  to  conform  to 
the  conventional  horizontal  scanning  of  a  CRT. 

Integration  under  the  curve  f(x)  is  performed  by  an  integrat¬ 
ing  light  detector  which  measures  this  light  output  from  the  CRT 
screen  during  a  frame  time  T<<l/30  sec  of  conventional  TV.  The 
screen  persistence  must  be  shorter  than  the  scan  duration  of  one 
line,  otherwise  the  lines  that  are  scanned  first  contribute  more  to 
the  integrated  light  output  than  those  that  are  scanned  last.  Under 

V  N 

these  conditions,  the  light  flux  is  proportional  to  J  f(n)L/N,  the 
'  •  n*l  • 

conventional  expression  for  the  numerical  evaluation  of  the  integral 

a+L  '  ‘ 

J  f (x)dx. 

a  ...  ... 

i.)  -•<*,  ,  ■  f' '  •  ’.j  ’  ■  ,  1  ..  •  <_ 

This  procedure  is  applicable  to  the  evaluation  of  the  integral 

/a+L  .  N 

f (x)g(x-s)dx,  i.e.,  of  the  sum  £f (n)g(n-s)L/N  approxl- 
a  n*l 

mating  it,  provided  g(x-s)  represents  the  brightness  of  the  line 
specified  by  x.  (Obviously,  for  every  value  of  s,  there  is  a  dif¬ 
ferent  set  of  g  values.)  Any  particular  term  in  the  sum,  e.g. , 
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f (n)g(n-s)L/N  represents  the  light  flux  from  scan  line  n,  of  width 
L/N,  scanned  at  a  brightness  g(n-s)  for  a  length  f(n)  starting  from 
the  left  edge  of  the  raster,  i.e. ,  from  the  line  AE  in  Figure  1. 
Therefore,  the  light  flux  integrated  over  the  frame  time  T  repre¬ 
sents  a  xunnerical  value  for  the  integral  I(s).  The  function  g  orig¬ 
inates  as  a  video  signal  V  applied  to  the  circuitry  of  the  ,CRT.  To 
be  able  to  generate  on  command  any  desired  value  of  the  line  bright¬ 
ness,  it  is  necessary  that  g  be  proportional  to  V.  This  is  accom¬ 
plished  by  passing  the  video  signal  through  a  gamma  amplifier  which 


matches  the  gamma  of  the' CRT. 


;  As  far  as  the  speed  of  operation. Is  concerned,  each  of  the  N 

■>•  )>  '  .  /  .  '  /  r  . 

products  in  the  sum  representing  I(s)  can  be  readily  formed  in  1 
psec,  particularly  if  the  CRT  is  of  the  electrostatic  deflection 
type.  The  1  psec  time  interval  takes  care  of  the  writing  rate  of  the 
tube,  the  response  of  the  phosphor  screen,  and  the  response  of  the 
photomultiplier.  The  retrace  time  from  the  end  of  one  scan  line  to 
the  beginning  of  the  next  one  is  a  fraction  of  1  psec.  If  f(x)  is 
a  processing  function  and  is  recorded  on  a  video  disc,  there  is  no 
difficulty  in  sensing  it  during  the  retrace  time,  particularly  since, 
by  definition,  f(x)  is  a  single  step  function  on  every  scan  line. 
Therefore,  the  sum  of  1000  terms  will  require  little  more  than  1 
millisecond.  If  the  values  of  g  are  obtained  by  scanning  the  pic¬ 
ture  with  a  flying  spot  scanner,  that  operation  can  be  slowed  down 
to  synchronize  with  the  summation  process. 
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If  I(s)  represents  a  correlation  integral  in  the  comparison  of 
two  sets  of  pictorial  data,  then  f(x)  is  also  an  intensity  and  is 
introduced  into  the  system  as  a  grid  voltage  drive.  Since  the  pro¬ 
cedure  requires  that  f(x)  specify  the  length  o^  a  scan,  the  signal 
must  be  transformed  to  the  scale  of  a  horizontal  deflection  voltage 
by  an  operational  amplifier.  Under  ordinary  circumstances  intensi¬ 
ties  are  not  obtainable  to  0.1%,  so  that  it  is  superfluous  to  use 
a  line  of  1000 "resolution  elements  to  represent  the  maximum  inten¬ 
sity  value.  A  much  shorter  scan  length  can  be  used  with  attendant 
increase  in  speed.  Thus  1  usee  will  cover  both  the  writing  and  the 
retrace  time,  allowing  1000  correlations  per  second  of  1000  points 
each. 


Another  mathematical  operation  of  special  interest  in  image  pro¬ 
cessing  is  the  Fourier  transform.  Let  g(n,m)  (n  and  m  ranging  from 
1  to  N)  represent  the  intensities  of  the  N?  successive  elements  of 
a  picture  arranged  in  N  rows  and  columns.  It  is  a  multiple  step 
function,  constant  over  the  area  of  any  element  and  changing  only 
at  its  boundaries.  The  sine  and  cosine  transforms  of  the  function 
p(n,m)  are  given  by 

M 

C(n,u)  =  £  g(n,m)  cos  2xmu/N, 

1X1=1 


C  (n ,  u )  +  iS(n,u)  =  0(n,u), 

N 

S(n,u)  =  1  g(n,m)  sin  2xrmu/N, 

m=l 
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but  these  sums  cannot  be  formed  directly  by  the  flux  integration 
method  since  sines  and  cosines  can  assume  both  positive  and  negative 
values. 

We  introduce  therefore,  the  processing  functions 

f  (m,p)  *  1  +  cos  2irmy/N, 
c 

f  (m,u)  *  1  +  sin  2^nnj/N, 
s 

which  give 

N  N 

C(n,y)  +  l  g(n,m)  *  J  g(n,m)F  (m,p), 

m=l  m=l 

N  N 

S(n,u)  +  l  g(n,m)  =,  J  g(n,m)F  (m,p). 

m*l  w=l 

The  summation  term  on  the  left  represents  the  sum  of  the  intensities 
of  all  the  picture  elements  on  line  n.  It  is  a  bias  level  added  to 
the  Fourier  transform  to  keep  it  positive.  The  sums  on  the  right 
are  formally  identical  with  the  same representing  the  integral  I(s) 
and  can  be  evaluated  in  exactly  the  smae  way. 

Presumably,  the  purpose  of  a  Fourier  transform  is  a  filtering 
operation  in  frequency  space.  The  procedure,  therefore,  is  as  fol¬ 
lows  : 

1.  Perform  the  Fourier  transform  of  the  input. 

2.  Write  the  results  on  the  video  disc  as  they  are  being 

obtained. 
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3.  Read  the  results  off  the  disc  in  order  to  multiply 
them  by  the  filter  function  provided  as  one  of  the  inputs. 

4.  Store  this  second  set  of  results  on  the  video  disc. 

5.  Read  the  results  off  the  disc  and  transform  them  back 
to  ordinary  coordinate  space. 

While  there  may  be  degradation  each  time  information  is  writ¬ 
ten  on  the  disc  and  is  read  back,  it  should  not  be  significant  with 
a  good  quality  disc  and  associated  electronics.  However,  additional 
transfers  to  and  from  storage  may  have  to  be  limited  as  long  as  the 
electronic  components  are  limited  to  those  readily  available. 

Consider  now  the  consequences  of  these  limitations  on  the  pro¬ 
cessing  times  involved.  If  every  real  or  imaginary  term  in  the  sum¬ 
mation  takes  1  usee,  then  each  term  in  the  corresponding  transform 
is  formed  in  1  msec,  so  that  about  17  minutes  are  needed  for  a  one¬ 
dimensional  transform  of  a  103  x  103  element  picture,  counting  both 
input  and  output  times.  Fourier  transforms  on  a  digital  computer 
are  carried  out  using  the  Cooley-Tukey  algorithm,  this  method  con¬ 
stitutes,  therefore,  the  standard  of  comparison  for  the  time  re¬ 
quired  to  perform  the  operation.  Multiplexing  is  possible  on  a 
CRT  computer,  though  the  auxiliary  coefficients  that  have  to  be 
computed  require  a  large  number  of  transfers  to  and  from  storage. 

The  parameters  of  the  video  disc  and  its  associated  equipment 
needed  for  the  Cooley-Tukey  approach  with  specified  limits  of 
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error  will  be  discussed  in  a  subsequent  paper.  However,  the  first 
step  of  this  approach  may  be  practical  at  present.  If  N  «  r2,  we 
break  up  the  N  term  summation  in  the  Fourier  transform  into  two 


\ 


summations  of  r  terms  eacn.  Thus,  for  every  value  of  n  and  p, 
there  are  2r,  i.e. ,  2»/N  operations  to  be  performed. 


m  ■  m'  +  m"r,  0  <_  m"  <  r-1,  1  m*  <_  r 

p  *  p'  +  p!,r,  0  <_  p"  <  r-1,  1  <  p*  <  r 


0(n,p' ,p") 


W  *  e27ri 

If  N  =  1024  or  322,  the  processing  time  for  the  entire  picture 
is  cut  down  from  10243  psec  to  64  x  10242  psec,  which  is  just  in 
excess  of  one  minute.  The  regular  Cooley-Tukey  approach  requires 
about  10  sec. 


On  the  basis  of  the  considerations  presented  here,  I  believe 
we  may  conclude  that  it  is  time  to  stop  tying  down  expensive  digi¬ 
tal  computers  with  the  processing  of  vast  amounts  of  experimental 
data.  Except  where  high  precision  is  required,  it  is  much  more 
cost  effective  to  use  TV  type  equipment.  At  the  very  least,  it  can 
serve  as  a  first  approximation  to  find  data  that  merit  more  pre¬ 
cise  consideration. 
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COMPUTER  SIMULATION:  AN  ESSENTIAL  APPROACH  TO  SOLVE 
COMPLEX  BIOLOGICAL  AND  ENVIRONMENTAL  PROBLEMS 


F.  Heinmets 

Pioneering  Research  Laboratory,  U.  S.  Army  Natick  Laboratories, 

Natick,  Mass. 

When  viewed  at  realistic  level,  the  common  property  of  many 
biological  and  environmental  systems  is  that  they  are  composed  of 
numerous  sets  of  entities  entailed  in  multiplicity  of  modes  of  inter¬ 
actions.  Both  systems  are  dynamic.  Time  dependance  of  these 
phenomena  is  a  basic  feature  and  has  to  be  accounted  for  in  modeling 
process.  We  propose  to  consider  here  some  essential  features  of 
modeling  for  both  systems  and  we  will  attempt  to  point  out  difficulties 
envisioned  in  this  type  of  work.  It  is  self-evident,  without  the  need 
of  further  elaboration,  that  only  via  computer  simulation  will  it  be 
possible  to  unravel  the  operational  characteristics  of  such  systems 
and  evaluate  the  regulatory  features.  While  many  smaller  problems 
can  be  studied  and  solved  by  conventional  methods  and  computational 
facilities,  such  approach  is  inadequate  for  large-scale  problems. 

For  example:  Global  aspects  of  water  and  air  pollution,  modification 
of  climate  by  environmental  factors,  mechanisms  of  diseases  and 
mental  disorders,  etc.  In  order  that  these  problems  could  be  realistically 
studied  and  solved  by  computer  simulation  techniques,  many  new 
methods  have  to  be  introduced  in  terms  of  operational  procedure  as 
well  as  computer  techniques  and  hardware.  In  addition,  a  new  kind 

of  institutional  and  organizational  patterns  have  to  be  developed  in 

This  paper  has  been  reproduced  photographically  from  the  author's  manuscript. 
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order  to  deal  with  multidisciplinary  aspects  of  the  problems  (1,2). 

Here  we  consider  only  the  problems  for  computer  simulation  and 
technical  difficulties  which  are  inherent  in  such  an  approach. 

Once  a  problem  is  selected  for  computer  simulation  and 
model  development,  it  is  desirable  to  establish  a  basic  procedure 
to  deal  with  the  problem.  The  following  general  steps  can  be  * 

proposed: 

1.  Assemble  a  competent  multidisciplinary  group  of 

/ 

professionals  to  establish  the  framework  for  modeling. 

2.  Data  collection  from  information  storage  and  other 
sources. 

3.  Preliminary  outline  and  scope  of  model  systems. 

4.  Rough  formulation  of  the  problem  and  estimation  of 
number  of  differential  equations  required. 

5.  Computer  selection: 

a.  If  suitable  computers  are  available  -  selection  has  to 
be  made  in  order  to  gear  other  steps  with  operational  characteristics 
of  the  computer. 

b.  If  suitable  computers  are  not  available  via  commercial 
sources,  the  only  alternative  is  a  program  to  develop  a  new  computer 
and  use  this  for  computer  simulation. 

6.  Develop  detail  model,  formulate  mathematically,  perform 
computer  simulation  and  compare  model  system  performance  with 
actual  system. 

> 
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7.  Determine  the  limitations  and  inadequacies  of  the  model 
and  improve  the  model  which  includes  further  data  collection.  This 
step  has  to  be  repeated  several  times  in  order  to  obtain  a  realistic 
model  system  performance  via  computer  simulation. 

8.  Once  a  realistic  model  has  been  established: 

a.  Study  the  dynamic  performance  of  the  system  and 
carry  out  parametric  analysis. 

b.  Determine  extreme  conditions  where  system 
becomes  irreversibly  uncontrollable  and  study  the  consequences; 
compare  with  the  actual  system  in  nature. 

9.  Recommend  on  the  basis  of  computer  simulation: 

a.  The  permissible  levels  of  various  entities  in 
the  system  and  establish  tolerance  ranges. 

b.  Make  recommendations  for  correcting  the  behavior 
of  the  system  in  nature. 

c.  Provide  data  for  the  enactment  of  new  laws  and 
environmental  standards. 

The  computer  problem  and  simulation.  The  facts  are: 

1.  There  is  no  computer  on  the  market  capable  of  solving 
large-scale  dynamic  systems  in  biology  and  environment  related 
problems. 

2.  It  is  impractical  to  develop  such  computer  by  industry, 
since  only  a  few  of  these  are  needed  and  operational  characteristics 
are  in  a  special  category. 
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3.  Only  feasible  way  to  overcome  this  problem  is  to  carry 
out  government  supported  special  computer  development  and 
establish  a  few  computer  centers  in  the  country. 

4.  Specifications  for  such  computer  design  have  to  be 
established  by  a  group  of  competent  scientists: 

a.  Specialists  from  various  problem  areas. 

b.  Specialists  in  the  computer  field. 

c.  Specialists  in  administrative  procedures  for 
the  computer  utilization. 

Who  is  going  to  make  the  decision  for  such  computer  development, 
and  who  is  going  to  fund  it  ? 

The  answer  is:  There  is  no  government  organization  to  do  it, 
and  there  is  no  interdisciplinary  competence  in  any  government 
branch  to  take  the  initiative  and  carry  out  such  steps!  For  such  purpose, 
new  institutions  are  required  (2). 

The  following  example  will  serve  to  focus  the  issue. 

A  practical  model  of  a  metabolic  process  is  presented  to 
illustrate  the  character  of  biological  problems  when  analyzed  in 
realistic  terms.  Tryptophan  metabolism  in  pineal  gland,  which  has 
recently  been  simulated  on  the  computer,  will  serve  as  such  an 
example.  Fig.  1  diows  the  basic  metabolic  scheme  and  Table  1  provides 
the  list  of  symbols  and  Table  2  the  flow  equations.  Original  paper 
should  be  consulted  for  problem  formulation  and  computer  simulation 
results.  System  entails  40  simultaneous  differential  equations. 
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The  program  scheme  for  analog  computer  is  shown  in  Fig.  2,  After 
an  operational  model  system  has  been  established  on  the  computer, 
the  time  dependance  and  interrelationship  of  all  functional  entities  can 
be  studied.  Technical  details  for  organizing  the  problem  on  the 
computer  can  be  found  in  another  publication  (4).  Computer  Simula* 
tion  reveals  that  such  system  can  be  stationary  ( Fig.  3)  when 
regulatory  compound  (ax)  is  absent,  but  will  become  oscillatory  when 
it  is  present  (Fig.  4  and  5).  A  more  detailed  relationship  between 
various  functional  entities  during  the  oscillatory  cycle  is  presented 
in  Fig.  6.  A  systematic  study  of  dynamics  of  the  model  provides 
essential  information  in  regard  to  basic  biological  functioning  of  the 
system. 

One  has  to  consider  that  living  species  and  environment  at 
large  contain  many  oscillatory  systems  coupled  together  within  the 
total  aggregate  system.  For  example,  cyclic  phenomena  in 
reproductive  physiology  present  indeed  a  formidable  problem  for 
computer  simulation.  However,  regulation  and  control  of  reproduction 
of  species  can  be  adequately  dealt  with  only  when  basic  mechanisms  of 
reproductive  processes  are  understood.  Needless  to  say,  without  a 
"  new  approach"  there  is  little  hope  that  large  scale  environmental 
problems  and  complex  human  diseases  can  be  properly  understood  nor 
adequate  solutions  found. 

\ 
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TABLE  1 


FUNCTIONAL  ENTITIES  AND  SYMBOLS 
E0  -  A  group  of  enzymes  Induced  by  substrate  s 
E1  -  Transport  enzyme  for  tryptophan 
E2  -  Tryptophan  5-hydroxylase 
E3  -  Tryptophan  5-HTP  decarboxylase 
E4  -  Monoamine  oxidase  £MAO) 

Eg  -  Alcohol  dehydrogenase 

E6  -  Hydroxy  indole  methyl  transferase  (HIQMT) 

3 

E5  -  Aldehyde  dehydrogenase 

E*  -  Acecy.Vating  enzyme 

E1  -  Adenyl  cyclase 
c 

2 

E^  -  Phospho  diesterase 

A„  -  Inhibitor  of  E* 

c 

Efc  r  Transport  enzyme 
P^  -  Internal  pool 
R  -  Inactive  repressor 
-.Active  repressot',. 

•  t ;  '  .  V  '  •■'"i'  '  • 

G.  -  Genes 

:  ..  ••• 

M  -  Messenger  I'RNA 

f  ■■ .  ■  ■■ 

B  -  Ribosome  ( , 
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SC  *  SB  ►  >* 


-  Activator  for  M 


1 


-  Inactive  activator 

-  C2  storage  receptor  (inactive) 

-  Active  storage  receptor 

Er  -  Norepinephrine  release  enzyme  (  jr  “  inactive  form) 

L  -  Light 

Cp  -  Parenchymal  cell  storage  receptor 

-  Nerve  cell  storage  receptor  .  i 

Eq  -  Total  enzymes  synthesized  as  the  result  of  a^  promotion 
effect.  (  e2»  E4  and  Ag  ) 

s*  -  Unbound  s,  in  nerve  cell 
4  4 

Sj  -  External  tryptophan 
s2  -  Internal  tryptophan 
s3  -  5-Hyaroxy  tryptophan  (5-HTP) 

-  5-Hydroxy  tryptamine  (5-HT) 
s ^  -  5-Hydroxy  indole  acetate 

s*  -  Tryptamine 

sc  -  5-Hydroxy  tryptophol 

s7  —  5-Methoxy  tryptophol 

n 

-  5-Methoxy  serotonin 
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1 

s 

5 


-  N-acetyl  serotonin 


Sg  -  Melatonin 

3 

s6  -  5-Hydroxy  indole  acetic  acid 

3 

sy  -  5-Methoxy  indole  acetic  acid 
[s4]?  -  External  (pool) 

[Sg]^  -  External  s*  (pool) 

ts4^p]“  s4  stored  in  the  parenchymal  cell 

[s.C  ]-  s  stored  in  the  nerve  cell 

4  n  4 

s1  -  Unbound  s,  in  nerve  cell 
H  4 

a  -  Norepinephrine  ([a,]  -  stored 

1  1  s 

a  -  E2  inhibitors 

2  c 

i^  -  Cycloheximide 
in  -  Actinomycin  D 

-  ATP 

* 

C2  -  CAMP  (C^bound  form) 

-  AMP 

ij  -  Seratonin  storage  transport  inhibitors 


various  enzyme  inhibitors 
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TABLE  2 


FLOW  EQUATIONS  FOR  THE  MODEL  SYSTEM 

Rate  constant (k) 
subindex  numbers 


1. 

S1 

+ 

E  -»■ 
1 

s2  +  Ex 

1 

2. 

+ 

E„  -*> 

s,  +  E 

2 

2 

2 

3  2 

3. 

12 

+ 

E  -> 
2 

[  i2E2] 

*  E2 

3.4* 

4. 

S  3 

+ 

E3^ 

S4  +  E3 

5 

5. 

*3 

+ 

E3^ 

ti3E3]  +E3 

6.7* 

6. 

s 

4 

+ 

E4 

°i  +  \ 

8 

7. 

S4 

+ 

EJ  - 

«S  +  K 

9 

8. 

s4 

+ 

E6- 

sl  +  h 

10* 

9. 

S4 

[S4] 

P 

11,  -  11 

10. 

*4 

+ 

E  -v 
4 

E4 

12,  13* 

11. 

S5 

+ 

e5- 

s  6  +  E5 

14 

12. 

*5 

+ 

E5- 

[i5E5]  - 

E5 

15,  16* 

13. 

Sb 

+ 

E6 

s7  +  E6 

17 

14. 

*6 

+ 

E6- 

ti6EsJ  * 

E8 

18,  19* 

15. 

S  5 

+ 

E3  -> 

Sl  +  E5 

20 

16. 

+ 

E6^ 

■7  +  E6 

21 

17. 

•i 

+ 

E6" 

Sb  +  E6 

22,  23,  • 

18. 

3^ 

p 

24,  -24 

31 


25,  -25 

26,  27* 


19.  s4  +  Cp  +  [s4Cp] 

20.  ij  +  c  -►  [i  c  ]  c 

P  1  P  p 

21*  s4  +  Et  -*•  s4  +  E 

22-  SJ  +  C„~*  I*icn]-  [sj  +  C  29,-29,  30 

1  ^  ” 


23.  s;  -  X 

69 

24*  ii  +  Et  ■*“  tijE^  ]  ->  Et 

31,  32* 

25.  Gr  +  P±  Gr  +  R 

_  * 

26.  R  +  S1 #  R 

b 

33,  -33: 

27.  Ga  +  Pi  •>  ga  +  Aj 

34 

28.  G.  +  R^r  G 

A  *A 

35,  -35 

29.  Aj  +  C2  ->  Ax  +  Hj 

37 

3°.  G:  +  p  ->  Gi+  M 

* 

31.  M  +  Aj-*”  M  ->  M 

39,  38 

32.  M  +  P  -*  E  +  M 

1  0 

40 

32A.  E  ->  X 
o 

41 

33‘  “  +1m*  lln»l 

+ 

42 

34.  E  E0 

o 

43 

35.  E  ->  E1 

0  4 

44 

36.  E  -*  A 

O  ii 

45 

f 
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38.  kn  +  E1  >  E1  ->  E1  48,  M 

2  c  *c  c 

39.  C,  +  E1  ->  C  +  E*  53* 

1  *c  2  *c 


40. 

C,  +  E1  -*  C.+  E\ 

1  C  2  c 

*  * 

C,  +  E1  ->  C0  +  E1 

1  c  2  c 

51 

41. 

52 

_  _  * 

42. 

C2  +  E*-  C3  +  E2 

78 

43. 

A9  +  E2  ■>  E2  v  E2 

2  c  *C  c 

79,  80* 

44a. 

ax  +  Hi  -*■  Hj-> 

*  *  * 

56,  55 

44b. 

H,  +  C2-*  C2+  Bj 

54,  50 

45. 

S  y  ~*X 

57 

46. 

s*  ->  X 

58 

47. 

8S*  X 

59* 

_  * 

48. 

■*  X 

60 

49. 

a2  -*■  X 

63 

50. 

c  ->  X 

2 

64 

51. 

sj  -  X 

* 

65 

52. 

Aj  +  X 

66 

53. 

Klp*  x 

67 

54. 

i*li  *  * 

68 

55. 

e2  ->  X 

70 

56. 

eJ  ■*  x 

71 

57. 

1‘J,  +  Er  -  a, 

72* 

A 

58. 

Er  +L2Sr 

73,  -73 

4 


59. 

s2 

+ 

E  3  8  2 

74 

60. 

s1 

•V 

X 

75 

61. 

s  5 

-V 

X 

76 

62. 

A1 

-* 

X 

77 

Note:  Equations  indicated  by  star  (*)  are  not  included  into  differential 
equations.  However,  their  role  is  simulated  on  the  analog  computer 
by  manipulation  of  appropriate  rate  constants.  Details  will  be 
discussed  in  the  text. 

F.  Heinmets,  Comput.  Biol,  Med,  (in  print). 
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0  I  2  3  •  < 

Time 

Figure  3 


Stationary  concentration  levels  of  various  substrates 
in  the  system  where  =  0  (4). 
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0  12  3  4  5  6 


Time 

Figure  4 

Oscillatory  relation  between  melatonin  (s^)  and  seratonin  acetylating 
enzyme  (E^)  (4) . 


Figure  5 


lime 

Oscillations  of  seratonin  (s,),  melatonin  (s~)  and 

j  4  6 

2£-acetyl-seratonin  (s~)  as  function  of  time  (4). 
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Figure  6 
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THE  USE  OF  MULTIPLE  REGRESSION  EQUATIONS  IN  THE  CAMPAIGN  EXECUTION  MODEL 

Sol  Haberman  and  Norman  T.  Rasmussen 


ABSTRACT .  Ships  in  task  force  operating  areas  are  subjected  to  air 
attacks.  For  all  such  attacks  a  basic  regression  equation  type  is  used 
which  represents  a  family  of  least  squares  fits  to  the  best  available 
data.  Three  kinds  of  data  are  used  as  independent  variables;  the  ef- 
fective  number  of  the  arious  types  of  resources  in  the  attacked  force, 
the  quality  of  the  del finder's  missiles,  and  the  number  of  attackers. 

The  equation  is  used  repetitively  with  different  sets  of  prestored  co¬ 
efficients  and  powers  in  order  to  predict  the  mean  number  of  hits  on 
units  of  both  sides.  These  hits  are  both  calculated  and  accumulated 
in  order  to  record  damage  levels  individually  by  hulls  for  specific 
units  of  the  various  types .  Deletion  of  units  occurs  when  input  thresh¬ 
olds  of  damage  (number  of  hits  for  this  type)  have  been  reached.  An 
input  criterion  determines  the  vulnerability  of  units  to  damage  by  sur¬ 
viving  hulls . 

These  calculations  have  been  organized  so  that  their  logic  may  be 
applied  to  all  forms  of  attack  and  damage,  not  only  to  the  iamediate 
air  warfare  requirement . 

I.  INTRODUCTION .  The  Campaign  Execution  Model  is  a  computer  pro¬ 
gram  for  simulation  of  a  major  war  at  sea.  It  was  originally  conceived 
by  Arthur  W.  Pennington  and  Jerome  Bracken. 

The  purpose  of  this  paper  is  to  discuss  some  of  the  recent  develop¬ 
ments  in  the  Campaign  Execution  Model  portion  of  the  procedure,  the 
whole  of  which  is  known  as  the  IDA  Campaign  and  Allocation  Model.* 

The  CEM  which  is  the  focus  of  this  discussion  is  designed  to  deal 
with  a  wide  range  of  scenarios  representing  threats,  resources,  and 
strategic  objectives  within  the  context  of  a  major  conventional  war 
in  support  of  current  strategy.  Its  specific  detailed  function  is  to 
do  bookkeeping;  to  keep  track  of  resources,  force  interactions,  re¬ 
source  attrition,  and  military  accomplishments  within  the  scenario 
framework  by  the  user.  Data  representing  force  effectiveness  and  en¬ 
counter  rates  which  are  CEM  inputs  are  obtained  from  external  analyses . 
The  value  of  a  given  outcome  is  determined  by  the  quality  of  these  ex¬ 
ternal  analyses . 


*The  IDA  Campaign  and  Allocation  Mode^  of  which  the  CEM  is  a  part,  is 
the  Saquentia^  Untrained  Minimization  Technique  (SUMT),  was  assigned 
to  the  Planning  Analysis  Group  of  APL/JHU  by  the  Director,  Systems 
Analysis  Division  (OP-96)  in  April  1969. 

This  paper  has  been  reproduced  photographically  from  the  author's  manuscript. 
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The  subject  matter  of  this  paper  is  to  report  on  the  new  representa¬ 
tion  of  the  air  attack  on  ships  making  up  the  various  task  forces . 

The  revised  CEM  model  has  been  tested  and  used  on  both  the  CDC-1604 
and  IBM  360/91  computers.  The  program  is  written  in  FORTRAN  IV. 

The  procedures  which  led  to  the  major  revision  of  the  air  attack  had 
their  origin  with  some  modest  initial  changes  in  the  antisubmarine  portions 
dealing  with  detection  probabilities . 

Historically,  the  initial  studies  of  the  CEM  logic  indicated  a  need 
to  change  the  method  of  calculating  the  probabilities  of  detections  of 
RED  submarines  by  task  forces,  convoys,  amphibious  assault  groups  (AMP) 
and  underway  replenishment  groups  (URG) .  The  procedure  finally  adopted 
was  one  which  utilizes  curves  to  represent  detection  probabilities  for  a 
range  of  BLUE  and  RED  types  and  numbers  of  forces .  The  curves  were  de¬ 
rived  from  data  developed  for  the  ASW  Force  Level  Study. 

The  ASH  Force  Level  Study  source  data  was  available  in  the  form  of 
probabilities  of  detection  of  a  single  RED  submarine  where  each  probability 
was  a  function  of  the  range  between  the  center  of  the  RED  force  and  the 
center  of  the  BLUE  force.  Probability  estimates  were  given  for  varying 
numbers  of  "Poor"  (SQS-23)  and  "Good"  (SQS-26,  SQQ-23)  BLUE  escorts  and 
for  varying  numbers  of  RED  nuclear  and  conventional  submarines .  The 
probabilities  combined  in  four  ways,  "Poor"  vs.  Nuclear,  "Poor"  vs. 
Conventional,  "Good"  vs.  Nuclear  and  "Good"  vs.  Conventional  and  were 
restated  in  the  form  of  curves,  using  least  squares  as  shown  schematically 
in  Table  I.  More  recent  information  indicates  that  these  curves  are  not 
entirely  adequate  at  the  lower  end.  The  curves  give  probabilities  of 
detection  which  arc  always  zero  for  all  types  of  forces  when  no  escorts 
are  present.  With  operational  and  cost  limitations  on  the  numbers  of 
escorts  which  can  be  assigned,  the  curves  cannot  and  do  not  in  practice 
reach  probabilities  of  1.00  at  the  other  end. 

Since  many  interactions  involve  mixes  of  units  on  both  sides,  inter¬ 
polations  among  the  set  of  four  curves  become  necessary.  These  inter¬ 
polations  give  the  same  results  whether  they  are  performed  first  between 
RED  submarine  types  (Nuclear  vs .  Conventional)  or  between  BLUE  escorts 
(Poor  or  Good) . 

The  expression  used  for  the  detection  of  a  Red  submarine  is: 

Probability  -  .<£(£)■&). 

Probability  -  (P  +  G)  (N  +  C) 
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where 


P  «  number  of 
G  =  number  of 
N  ®  number  of 
C  =  number  of 


Poor  BLUE  Escorts 

Good  BLUE  Escorts 

RED  Nuclear  Submarines 

RED  Conventional  Submarines 


and  PN  =  Prob .  of  Detection  of  RED  Sub  (Given  Poor 
Escorts  vs.  RED  Nucs) 

PC  *  Prob .  of  Detection  of  RED  Sub  (Given  Poor 
Escorts  vs .  RED  Conventions Is) 

GN  =  Prob .  of  Detection  of  RED  Sub  (Given  Good 
Escorts  vs  .  RED  Nucs) 

GC  *  Prob .  of  Detection  of  RED  Sub  (Given  Good 
Escorts  vs .  RED  Conventions Is) 


Probabilities  of  Detection  of  a  Single  Red  Submarine 
Task  Forces  (Subroutine  'VJARCTF") 

PN  *  KpN  *  Vi 
PC  -  Kp(,  *  VE 
GN  *  *  VI 

gc  ■  kgc  *  Ve 

PN  *  Poor  Sonar  vs  .  Nuclear  Submarine 

PC  =  Poor  Sonar  vs  .  Conventional  Submarine 

GN  =  Good  Sonar  vs  .  Nuclear  Submarine 

GC  -  Good  Sonar  vs  .  Conventional  Submarine 

K  =  Numerical  constants  developed  by  least  squares 
calculations 

E  =  Number  of  Blue  Escorts 
TABLE  I 


Note:  The  equations  for  probabilities  of  detection  used  by  Amphibious 
Forces,  Underway  Replenishment  groups  and  Convoys  are  the  same 
as  above.  The  differences  among  these  forces  are  represented  by 
K  va lues  . 


II.  BACKGROUND  OF  AAW  SUBROUTINE  REVISION.  Major  effort  was 
directed  to  developing  new  concepts  for  and  rebuilding  the  AAW 
.subroutines  known  as  TASKAIR  of  the  CEM.  This  set  of  subroutines 
is  vital  to  the  operation  of  the  CEM  because  in  it  is  conducted  the 
entire  RED  air  and  cruise  missile  attack  against  the  BLUE  carrier  task 
force.  The  interactions  in  TASKAIR  can  have  a  considerable  effect  on 
BLUE’S  conduct  and  success  in  the  war. 

The  model  used  for  the  AAW  study  was  the  Systems  Interaction 
Model  II  (SIM  II)  which  was  developed  by  APL/jHU  and  which  has  been 
used  extensively  by  the  Navy  (primarily  the  assistant  DCNO  for  War 
Gaming  matters  (0P-06c))  for  about  six  years.  The  SIM  II  model  is  a 
Monte  Carlo  event-store  simulation  of  naval  AAW  interactions.  1 

Hie  AAW  study  conducted  with  SIM  II  was  not  intended  to  be  an. 
all-inclusive  study  of  the  AAW  problem.  The  primary  objective  was  to 
study  the  various  elements  which  affect  the  outcome  of  an  AAW  conflict 
in  order  to  determine  within  certain  limits  which  are  the  important 
variables  and,  determining  this,  to  develop  an  improved  flexible  meth¬ 
odology  for  representing  AAW  interactions  in  the  CEM.  The  second 
purpose  for  using  SIM  II  was  to  obtain  a  reasonable  AAW  data  base  with 
which  real  studies  could  be  made. 

The  initial  intent  with  respect  to  the  use  of  this  model  was 
to  obtain  data  from  which  to  derive  an  equation  for  predicting  the  num¬ 
ber  of  RED  aircraft  shot  down  and  to  incorporate  this  into  the  CEM.  As 
runs  were  being  made  it  became  evident  that  there  were  many  additional 
types  of  information  available  from  the  SIM  II  outputs  which  could  also 
be  used  in  the  CEM.  An  example  of  this  is  number  of  hits  on  ship  types 
such  as  CV’s,  DLG's,  DDG’s,  and  DD's.  It  also  became  clear  that  the 
SIM  II  study  would  produce  information  which  could  be  expressed  in 
predictive  equations  representing  the  effectiveness  of  the  various  BLUE 
weapon  systems.  These,  while  not  immediately  applicable  to  CEM,  have 
usefulness  elsewhere. 

In  order  to  obtain  the  representative  equations  required  to 
summarize  a  typical  range  of  RED  and  BLUE  antiair  warfare  interactions, 

SIM  II  was  used  to  simulate  a  variety  of  attack  carrier  task  force  dis-  » 

positions  which  were  subjected  to  RED  air  attack.* 

Red  raids  varied  in  size  from  30  to  150  bombers  in  steps  of  15 
and  cruise  missile  attacks  varied  in  size  from  60  to  300  missiles  in 
steps  of  30.  Following  the  principle  of  maximum  concentration  of  force,  * 

each  bomber  rgid  consisted  of  five  equal  segments  approaching  the  task 
force  in  a  90  arc  with  all  reaching  ASM  launch  point  at  approximately 
the  same  time. 


*Further  details  on  these  interactions  are  available  in  the  Naval  Force 
Methodology  Study,  APL/jHU/PAG  No.  40-T1,  CNO/OP-9<5/TR-3300,  on  pages 
13-29. 
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Two  basic  categories  of  ships  were  used  in  102  BLUE  dis¬ 
positions.  The  first  consisted  of  those  types  that  contributed  di¬ 
rectly  to  the  AAW  defense  of  the  force  by  having  the  capability  to 
shoot  down  RED  bombers  and/or  missiles.  The  second  category  of  BLUE 
ships  were  those  that  could  not  shoot  down  REID  bombers  and  missiles, 
but  which  did  contribute  indirectly  to  the  outcome  of  the  battle  by 
acting  as  additional  targets  for  RED  missiles  to  acquire  and  attack, 
thereby  reducing  the  number  of  hits  taken  by  the  CVA,  CG,  DLG,  and 
DDG  types.  The  ships  in  this  category  included  the  CVS  and  DD's  for 
simulated  ASW  protection  and,  at  times,  an  Underway  Replenishment 
Group  (URG)  and  an  amphibious  force  (AMP)  each  with  its  accompanying 
ASW  escorts. 

The  data  summarized  from  SIM  II*  requires  multiple  regression 
analysis  because  no  single  input  variable  by  itself  is  adequate  to 
explain  outcomes  such  as  bombers  shot  down,  hits  on  the  CVA*s  etc. 

There  are  eight  basic  driving  variables  used  to  predict  these  outcomes: 
i.e.  numbers  of  CVA's,  CG's,  DLG's,  DDG's  number  of  attackers  and 

weapon  performance  level.  Each  game  was  analyzed  in  two  ways:  (l)  as 
an  aircraft  attack  using  ASM's,  (2)  as  a  cruise  missile  attack  treating 
the  ASM's  successfully  launched  as  the  attacking  cruise  missiles. 

Each  of  1,017  different  games  was  repeated  five  times  and  mean 
numbers  were  obtained  for  47  different  outcomes  for  the  aircraft  attack 
and  34  different  outcomes  for  the  cruise  missile  attack. 

III.  THE  NEW  TASKAIR  SUBROUTINES.  The  TASKAIR  purpose  remained  the 
same  after  all  changes  had  been  completed;  to  subject  the  BLUE  ships  in 
each  Task  Force  Operating  Area  to  RED  air  attack  in  each  ocean.  RED  at¬ 
tack  is  by  sub-launched  cruise  missile,  by  stand-off  aircraft  launched 
missiles,  or  by  coordinated  or  uncoordinated  combinations  of  these  two. 
Inputs  control  their  order  and  coordination. 

For  all  such  attacks,  a  single  basic  equation  type  is  used  to 
represent  a  least  squares  fit  to  the  best  available  data  as  a  means  of 
predicting  the  various  results  of  the  attack.  The  actual  form  of  the 
equation  is  that  of  the  multiple-regression  procedure  to  be  described. 
The  equation  uses  three  kinds  of  data  as  independent  variables:  the 
effective  number  of  the  various  types  of  BLUE  resources  in  the  attacked 
force,  the  quality  of  the  BLUE  SAM  missiles,  and  the  number  of  RED  at¬ 
tackers.  The  equation  used  repetitively  with  different  sets  of  pre¬ 
stored  coefficients  and  r  ers  in  order  to  predict  the  mean  number  of 
hits  on  BLUE  and  RED  units.  These  hits  are  both  calculated  and  accu¬ 
mulated  in  order  to  record  damage  levels  individually  by  hulls  for  the 
CVA's,  DLG's,  DDG's,  the  CVS  and  the  CG,  and  by  class  type  for  all  other 
units.  Deletion  of  units  occurs  when  input  thresholds  of  damage  (number 
of  hits  for  this  type)  have  been  reached.  Also  provided  is  an  input 
criterion  that  determines  the  vulnerability  of  units  to  damage  by 
surviving  hulls  on  a  "least-hits-to-the-most-damaged-unit"  basis,  or 
a  most-to-least  basis,  or  in  some  weighted  combination  of  these. 


*The  multiple  regression  procedure  used  here  for  least  squares  curve 
fitting  to  the  SIM  II  data,  is  the  BCC  Library  Routine  No.  8.06.01, 
adapted  by  J.  Bramhall  to  the  7094  at  APL.  This  follows  a  computational 
method  developed  by  M.  A.  Efroymson  and  is  discussed  in  Chapter  6  of  "Applied 
Regression  Analysis"  by  Draper  and  Smith  (Wiley  i960). 
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These  calculations  have  been  organized  so  that  their  logic  may 
be  applied  to  all  forms  of  attack  and  damage,  not  only  to  the  immediate 
air  warfare  requirement.  Much  of  this  procedure  is  now  in  use  with  RED 
submarine  torpedo  attacks. 

Thus,  TASKAIR  performs  five  major  functions: 

1.  Determines  the  total  number  of  hits  on  all  units, 
either  individually,  or  by  unit  type,  by  refer¬ 
encing  appropriate  equations. 

2.  Combines  these  hits  with  past  damage  records. 

3*  Deletes  "killed"  units  (i.e.  units  for  which  the 

total  damage  exceeds  an  input  kill  threshold  for  the 
given  type  of  unit). 

4.  Reflects  intelligence  and  BLUE  doctrine  by  causing 
the  assignment  of  hits  in  subsequent  attacks  on 
individual  surviving  ships  of  the  CVA,  DLG  and  DDG 
type  to  be  inflicted  in  a  "least-hits-to-most-damaged- 
unit"  order,  "most-to -least"  order,  or  a  weighted 
combination  of  these. 

5.  Uses  the  number  of  hulls  present  for  the  CV's,  DLG's 
and  the  CG's  so  as  to  provide  an  expansion  of  the 
performance  criteria  studied  to  include  the  number 
of  hull  carrier-days  on  the  line,  CG  days  on  the 
line,  etc. 

TV.  DETERMINATION  OF  HITS.  A  schematic  example  of  the  multiple 
regression  equation  used  to  predict  the  mean  number  of  hits  on  the  CVA 
which  gets  hit  worst,  or  second  worst,  etc.  by  a  RED  aircraft  attack  is: 

HITS  =  CQ  +  C  *  (CV)P1  +  C2  *  (CG)P2  + 

c3  *(dlg)p3  +  c4*(ddg)p4  +  c5*(dd)p5  +  c6*(a+u)p6  +  ct*(a/c)p7  +  Cq(QSAM)P8. 


Where,  * 

the  £'s  are  the  set  of  real  number  coefficients,  and 
the  P's  are  the  set  of  positive  real  numbers  uniquely  assoc¬ 
iated  with  this  result.  The  values  for  the  C's  and  P's  are 
determined  by  the  least  squares  fit  to  the  best  data  avail¬ 
able  .  * 

The  quantities  in  the  parentheses  are  the  independent  var¬ 
iables  of  the  equation.  Their  values  are: 
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cv 

CG 

DLG 

DDG 

DD 

A+U 


a/c 

QSAM 


effective  number**  of  CVA's  in  the  task  force  + 
effective  number  of  CVS's  in  the  task  force, 
effective  number  of  CG's  in  the  task  force, 
effective  number  of  DLG's  in  the  task  force, 
effective  number  of  DDG's  in  the  task  force, 
effective  number  of  DD’s  in  the  task  force, 
effective  total  number  of  units  of  all  types 
in  the  AMP  and/or  URG  force  associated  with 
the  main  task  force. 

number  of  attacking  RED  aircraft  (or,  number 
of  attacking  cruise  missiles,  as  required), 
quality  level  of  BLUE  SAM's  used  to  shoot  down 
RED  aircraft  or  missiles.  Che  input  value 
represents  high  performance,  low  performance, 
or  an  interpolated  level  of  performance. 


The  equation  used  to  predict  the  mean  number  of  hits  on  some 
resource  other  than  a  CV A  is  identical  in  form  to  the  above  equation. 
In  this  event,  the  program  selects  from  a  series  of  "RED  aircraft  at¬ 
tack"  arrays,  those  values  of  the  coefficients  and  powers  pertaining 
to  the  result  required. 

The  application  of  the  equation  to  a  cruise  missile  attack  re¬ 
quires  only  a  minor  variation  in  the  list  of  independent  variables: 
in  the  schematic  example  above,  "a/c"  becomes  the  number  of  attacking 
cruise  missiles.  The  program  now  selects  the  appropriate  constants 
from  a  series  of  "cruise  missile  attack  arrays." 


For  a  given  air  attack  by  RED,  whether  by  aircraft  or  by  cruise 
missile,  TASKAIR  works  through  the  sequence  of  equations  to  get  the 
individual  mean  number  of  hits  (from  the  unit  with  the  largest  number 
of  hits  to  the  unit  with  the  smallest  within  types)  on  the  various 
CVA's,  DLG's,  DDG's,  the  CVS  and  ohe  CG.  All  other  units  are  repre¬ 
sented  by  a  single  prediction  equation  for  each  unit  type.  Specifically, 
the  DD's  attached  to  the  carrier  group,  the  AMP  and  URG  main-body  ships, 
and  the  AMP  and  URG  escorts  each  receive  their  hits  in  totals  for  their 
respective  types.  These  particular  units  are  treated  as  aggregates 
because  the  number  of  hits  which  kill  each  unit  generally  is  very  small. 


**By  effective  number  of  units  is  meant  the  total  obtained  by  adding 
together  the  effectiveness  values  of  the  surviving  units  of  a  given 
type.  User  inputs  determine  the  fractional  effectiveness  of  a  unit 
after  it  has  received  one  or  more  hits.  Although  the  provisional 
relationship  between  hits  and  unit  effectiveness  is  linear,  this  can 
be  modified  to  reflect  cases  where  earlier  or  later  hits  have  dis¬ 
proportionate  effects. 

An  example  of  the  current  procedure  is:  given  a  kill  deletion 
criterion  of  6  hits,  a  unit  receiving  1  hit  would  have  5/6  effective¬ 
ness  and  with  2  hits  would  have  4/6  effectiveness. 
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V.  COMBINING  CURRENT  HITS  WITH  FAST  DAMAGE  RECORDS.  For  each 
individual  or  group  result  obtained  from  an  equation,  there  exists 
a  matching  cumulative  damage  record  of  hit  counter.  The  content 
of  this  hit  counter  is  the  damage  from  prior  air  and  torpedo  attacks. 
These  cumulative  hit  counters  are  permanent  records  associated  with 
each  of  the  CVA’s,  DLG's,  DDG's,  the  CVS  and  the  CG{  for  the  other 
units,  hit  records  are  kept  by  totals  for  all  units  of  a  given  type. 

A  detailed  example  of  an  individual  hit  counter  is  given  under  the 
heading  "Assignment  of  Hits  in  Subsequent  Attacks." 

VI.  DELETING  UNITS.  The  third  function  of  TASKAIR  is  to  test 
up-dated  damage  or  hit  counters  for  kills. 

In  the  case  of  hits  on  the  grouped  units  present,  the  number 
killed  is  determined  by  the  number  of  times  the  hits-to-kill  criterion 
divides  into  damage  or  hits  calculated  for  this  type  of  unit.  For 
example,  if  there  are  seven  hits  on  a  number  of  DD’s  with  the  carrier 
task  force,  and  if  it  takes  two  hits  to  kill  a  DD,  then  3*5  DD’s  are 
deleted. 


If  the  numbers  of  hits  on  a  unit  treated  individually  (CVA, 

CVS,  or  DDG)  equals  or  exceeds  the  input  hits-to-kill  criterion  for 
this  type,  the  unit  is  deleted. 

Thus  for  both  types,  the  effective  number  (not  hulls)  of  a 
given  type  of  unit  remaining  after  an  attack  equals  the  original 
effective  number  prior  to  the  current  attack  minus  effective  units 
deleted  with  this  attack.  However,  group  and  individually  treated 
units  do  differ  in  regard  to  the  method  of  "hull"  deletion. 

VII.  ASSIGNMENT  OF  HITS  IN  SUBSEQUENT  ATOACK5.  The  cumulative 
damage  records  for  the  groups  of  non-deleted  individual  units  (CVA's, 

DLG'3,  and  DDG's)  are  ranked  in  order  to  reflect  the  damage  doctrine 
in  effect.  The  damage  doctrine  input  reflects  the  degree  of  BLUE's 
ability  to  align  his  resources  optimally  with  respect  to  the  azimuth 
of  the  next  air  attack,  or,  or  KED’s  ability  to  thwart  BLUE's  pre¬ 
dictive  capability.  Consequently,  the  input  choice  causes  the  damage 
from  the  REID  air  attack  to  be  suffered  according  to  one  of  three 
options :  • 

1.  The  units  receiving  the  most  hits  are  the  units 

■with  the  least  amount  of  prior  damage,  or 

2.  The  units  receiving  the  most  hits  are  the  units 

■with  the  most  amount  of  prior  damage,  or  ■ 

3.  A  -weighted  selection  of  options  (l)  and  (2). 

As  an  example  assume  that  three  surviving  CVA’s  have  received 
totals  of  2.0,  2.5,  and  3.9  hits  respectively  in  all  previous  attacks. 

Further,  assmne  that  thfi-lhree  have  received  1.1,  1.7,  and  2.4  hits  in 
the  current  interval.  The  individual  cumulative  damage  or  hit  counters 
would  be  tallied  differently  under  different  damage  doctrines  although 
the  total  number  of  hits  is  identical: 
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CUMULATIVE  DAMAGE  (HIT)  COUNTER  CONTENTS 


Damage  Option  1  Damage  Option  2 


Old 

New 

Total 

Old 

New 

Total 

1st 

CVA 

2.0 

2.4 

4.4 

3.9 

2.4 

6.3 

2nd 

CVA 

2.5 

1.7 

4.2 

2.5 

1.7 

4.2 

3rd 

CVA 

3-9 

1.1 

5.0 

2.0 

1.1 

3.1 

BIT 

5.2 

iJZ 

B7T 

5-2 

13X 

(Throughout  these  calculations  outcomes  representing  subtotals  are 
adjusted  in  terms  of  predicted  totals.  For  example,  it  may  be  assumed 
here  that  in  these  calculations  the  predicted  hits  on  all  of  the  CVA's 
8.1*  and  5*2,  have  been  adjusted  to  conform  to  fleet  totals  in  each  of 
the  go-rounds  and  that  hits  on  individual  CVA's  have  in  turn  been  adjusted 
to  conform  to  CVA  totals.  This  is  reasonable  since  fitting  curves  to  outcomes 
representing  large  aggregates  of  units  is  more  reliable  than  to  smaller 
aggregates.) 

Note  that  there  options  may  produce  different  numbers  of  kills 
depending  upon  input  kill  criteria.  E.g.,  if  the  kill  criterion  is  6.0, 
no  kill  results  under  option  1  and  1  kill  result  under  option  2. 

VIII.  THE  "HULL”  COUNTING  CONCEPT.  Historically,  CEM  and  other  models 
have  measured  results  in  terms  of  the  effective  number  of  units.  A  new 
concept  has  been  added  here  by  measuring  results  in  terms  of  the  gross 
number  of  units,  or  "hulls".  Both  methods  are  important  and  they  comple¬ 
ment  each  other. 

The  concept  of  hull  counting  can  be  used  in  at  least  two  ways: 

(l)  to  simulate  internally  in  the  model  more  realistic  interactions  in 
terms  of  fuel  and  ammunition  requirements,  and  (2)  as  a  measure  of  effec¬ 
tiveness,  specifically  "CVA  Hull  Days  on  the  Line",  which  statistic  is 
now  available  as  an  output. 

The  new  TASKAIR  methodology  demonstrates  that  the  experience 
obtained  from  fleet  exercises,  from  running  computer  models  and  from 
theoretical  investigations  can  be  reduced  to  equations  referenced  as 
needed  in  CEM. 

The  new  TASKAIR  subroutines  show  that  by  using  predictive 
equations  developed  from  a  reliable  data  source  both  the  effects  of 
attacks  on  units  in  the  task  force  areas  and  the  losses  to  the  attackers 
can  be  determined  and  used.  The  subroutines  calculate  and  maintain  damage 
records,  delete  units  and  reflect  the  redisposition  of  force  to  maximize 
defensive  potential. 
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Structurally  the  set  of  small  TASKAIR  subroutines  deals  with  actions 
such  as  hits,  kills,  and  damage  calculations.  These  action  subroutines 
are  generally  applicable  to  all  types  of  warfare.  For  example,  the  metho¬ 
dology  of  the  hits-to-kill  calculations  and  the  updating  of  damage  histories 
has  been  applied  to  the  torpedo  warfare  of  the  WARCTF  subroutine  of  CEM. 

IX.  SUMMARY  OF  METHODOLOGY.  The  use  of  SIM  II  and  the  study  of  data 
generated  by  the  model  has  led  to  the  development  of  two  new  aspects  of 
the  force  structure  methodology  which  have  been  incorporated  in  the  CEM. 

Both  of  these  new  developments  can,  and  have,  been  applied  equally  well 
to  ASW  and  AAW  problems.  These  developments  are: 

1.  The  results  of  other  investigations,  (whether  they  be  obtained 
from  theory,  from  fleet  exercises,  or  from  using  models  such  as 
SIM  II)  can  be  represented  by  equations  used  in  subroutines  by 
the  Campaign  Execution  Model  as  required, 

2.  The  CEM  Model  can  remove  units  from  the  action  according  to 
input  damage  levels  fixed  by  input  choice.  The  outcomes  of  SIM  II 
were  summarized,  not  only  in  equations  which  predict  mean  number 
of  total  hits  on  all  CV's,  all  DLG's,  and  on  all  DDG’s,  but  in 
the  form  of  predicting  a  mean  number  of  hits  on  the  CV  which  gets 
hit  worst,  on  the  CV  which  gets  hit  next  worst  and  so  on  through 
the  list  for  all  of  the  types  of  units  of  interest. 

In  addition  to  the  equation  originally  sought,  giving  the  number  of 
attacking  RED  aircraft  shot  down,  there  are  now  two  basic  groups  of  equations 
obtained  from  running  SIM  II  based  on  the  type  of  RED  air  attack.  The  first 
group  represents  attacks  by  RED  aircraft  in  which  the  number  of  aircraft  in 
the  raid  is  one  of  the  inputs.  The  other  group  represents  attacks  by  cruise 
missiles  from  RED  submarines  in  which  thd  number  of  cruise  missiles  launched 
is  an  input. 

The  equations  predicting  hits  on  task  force,  underway  replenishment 
groups,  and  amphibious  units  provide  the  framework  of  the  new  subroutines. 

The  subroutines  have  been  programmed  so  that  a  single  basic  equation  is  used 
repetitively  (cycling  through  both  types  of  air  attack)  with  appropriate 
coefficients  and  powers  utilized  for  each  kind  of  BLUE  and  RED  resources 
to  predict  mean  numbers  of  hits  on  the  BLUE  units  of  interest.  These 
predicted  hits  are  added  to  cumulative  hit  records  which  are  maintained 
throughout  a  run.  Units  are  deleted  according  to  input  thresholds  using 
tl  .-se  records. 

These  developments  have  been  incorporated  in  the  new  TASKAIR 
subroutines  and  in  the  subroutines  which  calculate  damage  from  submarine 
torpedo  attack. 
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DESIGN  OF  SURVEY  SYSTEMS  USING  NONLINEAR  PROGRAMMING  METHODS 


CPT  Clifford  W.  Greve 
Advanced  Technology  Division 
Computer  Sciences  Laboratory 
US  Army  Engineer  Topographic  Laboratories 
Fort  Belvoir,  Virginia 

ABSTRACT.  This  paper  discusses  theory  and  results  of  an  attempt  to  use 
nonlinear  programming  methods  to  arrive  at  an  optimal,  in  the  sense  of  least 
cost,  solution  to  the  design  of  a  survey  system  to  meet  specified  accuracy. 
In  other  words,  the  method  determines  the  combination  of  various  types  of 
observations  which  will  yield  the  required  accuracy  of  control  points  for 
a  minimum  cost.  The  theoretical  background  of  the  procedure  is  discussed, 
and  methods  of  extension  to  photogrammetry  and  other  sciences  are  presented. 
Much  of  the  paper  is  concerned  with  discussing  results  of  numerical  solu¬ 
tions  for  the  optimal  design  of  several  small,  but  typical  mapping  problems. 

It  is  believed  that  this  research  is  original  with  the  author,  as 
extensive  literature  searches  and  correspondence  has  produced  no  knowledge 
of  prior  research  into  this  application  of  nonlinear  programming.  The 
method  at  its  current  state  of  development  appears  to  be  capable  of 
yielding  significant  improvements  to  the  present  concepts  of  survey  net¬ 
work  design. 

DISCUSSION  OF  PROBLEM.  In  experimental  scie  ce  a  frequently  recur¬ 
ring  problem  is  to  determine  certain  non-measurable  parameters  which  are 
functionally  related  to  measurable  parameters.  In  addition,  in  most  cases 
the  relating  functions  are  known.  If  one  now  knew  the  influence  of  cost 
upon  the  value  of  the  variance-covariance  matrix  of  the  observable  param¬ 
eters,  then,  by  the  method  of  least  squares  he  could  arrive  at  the  influence 
of  cost  upon  the  accuracy  of  the  non-measurable  parameters.  One  may  then 
reverse  the  problem  and  ask  what  would  be  the  minimum  cost  of  attaining 
a  specified  accuracy  for  the  non-observable  variables.  This  is  the  problem 
toward  which  this  paper  is  addressed.  It  should  be  noted  at  this  point 
that  the  mathematics  and  method  presented  herein  are  applicable  to  all 
problems  of  optimization  of  experimental  design  as  outlined  above.  At 
any  point  where  a  method  is  restricted  to  the  specific  problems  of  geodesy 
and  photogrammetry,  particularly  horizontal  control  surveys,  special 
mention  will  be  made  of  this  fact. 

At  this  point  a  brief  review  of  the  history  of  attempts  of  solution 
of  the  above  problem  might  be  in  order.  The  first  such  attempts  consisted 
of  generalized  specifications.  These  specifications  were  generally  rigid 
enough  to  insure  that  accuracy  requirements  were  met,  but  often  required 
far  more  work  than  actually  required. 

The  remainderof  this'  paper  has  been  reproduced  photographically  from  the 
author's  manuscript. 
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'.'ith  electron/ r  c«v..put  err-  catie  the  aye.  of  computer  simulation.  By 
choosing  th.*  -et  oi  measurements  to  he  made  before  actually  making  them, 
the  error  could  be  rropa;,<  tcd  and  the  error  for  the  non-neasurnbic  param¬ 
eters  found.  One  would  then  pick  a  near  optimal  solution  simply  by 
choosing  the  least  costly  set  of  observations  from  among  those  sets  of 
observations  vie' din;;  acceptable,  results  for  the  unknowns.  With  the 
advent  of  the  "Kalman  Filtering"  algorithms,  which  are  nothing  more 
than  sequential  least  squares  if  no  time  variation  is  involved,  the 

capability  for  easily  adding  and  deleting  observations  was  realized. 

1 

This  facilitated  the  search  for  least  cost  solutions.  However,  since 
for  most  real  world  problems,  there  is  a  practical  infinity  of  possible 
observation  sets,  the  distinct,  possibility  still  exists  that  the.  optimum 
(least  cost)  data  set  night  never  be  tried.  Thus,  although  a  minimum 
cost  for  the  set  of  solutions  tried  can  he  obtained,  the  solution  giving 
true  minimum  cost  may  not  be  in  the  set  tried,  and  therefore  will  not 
be  discovered. 

The  method  proposed  herein  alleviates  this  problem  of  omission  of 
data  sets,  because  all  possible  observations  are  considered,  and  the 
computer  picks  which  ones  to  use  to  arrive  at  an  optimum  configuration. 
Thus,  the  solution  of  the  design  of  experiments  problem  becomes  less 
dependent  upon  human  past  experience,  and  more  dependent  upon  analytical 
methods . 

not  reproducible 
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where  ^  neons  that  one  of  the  symbols  >,  <,  =  applies,  and  X  b  a  vector. 

The  and  f  nay  be  any  functions  of  X,  linear  or  nonlinear.  If  they  are 

ail  linear,  the  problem  is  termed  a  linear  programming  problem.  Other- 
vise  it  is  a  nonlinear  programming  problem. 

A  practical  method  for  solving  the  linear  programming  problem  *.;as 
developed  by  George  Dan  t*.  ip  in  1948.  This  algorithm,  called  the  simplex 
algorithm,  consists  of  an  ordered  search  of  the  vertices  of  the  feasible 
solution  set  (the  set  of  all  vectors  X  which  satisfy  the  constraints) . 

The  search  is  ordered  such  that  the  next  vertex  looked  at  is  always  at 
least  as  good  (from  the  viewpoint,  of  maximizing  or  minimizing  the  func¬ 
tion  f)  as  the  one  before.  Since  for  the  linear  programming  problem, 
the  boundaries  of  the  feasible  solution  sets  are  hyperplanes  in  in  space 
(if  X  is  an  n  vector),  then  there  are  only  a  finite  number  of:  such  vertices, 
and  therefore  the  process  must  converge  in  a  finite  number  of  iterations. 
This  still  leaves  open  the  possibility  that,  for  large  in  and  n,  the  number 
of  iterations  could  become  a  very  large  finite  number,  but  in  practice  it 
lias  boon  found  that  the  algorithm  will  normally  converge  in  a  very  reason¬ 
able  n sober  of  iterations. 
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A  nsthorvt.  ical  I'cst ripl  ion  of  the  simplex  algorithm  fol  io-/.:.  I>!o 
proofs  ore  give:',  or  even  hinted  at.  ’Proof  s  nay  he  found  in  (1),  ca¬ 
nny  pood  text  on  1 incar  programming. 


Given  the  linear  programming  problem 
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be  change  the  problem  into  so  called  standard  form  by  adding  so  called 
slack  variables  to  each  of  tin:  first  j  constraints  to  transform  them 
into  equalities.  The  cost  of  these  slack  variables  is,  of  course.  Kero, 
so  the  vector  C  will  not  be  influenced. 


be  now  have 


*■ 

T 

min  C  X 


subject  to 


Tg..x.  +  x  .. 

j  m+1 


^gijXj  +  Xcr+i  = 

Vl,jX.j  +  0  “ 


Vi  + 


The  above  is  called  the  liiiear  programming  problem  in  standard  fora. 

Changing  the  notation  slightly,  we  can  write  the  above  problem  as 

min  CT  X 


subject  to 


GX  *  I?  where  G  is  a  matrix. 

For  the  p rob  1  or;  above,  we  define  a  basic  feasible  solution  as  any  solu¬ 
tion  X  having  n  positive  elements.  (For  any  problem,  either  n  will  be. 
less  than  n  or  one  or  more  of  the  constraints  will,  be  redundant.)  This 
is  not  a  precise  definition,  but  it  is  as  close  as  one  can  come  without 
assuring  a  detailed  knowledge  of  convex  set  and  linear  manifold  theory. 
Thu  interested  reader  will  find  this  elaborated  upon  in  (1). 


Wo  will  now  define  the  simplex  algorithm  beginning  with  the  assurnp- 

tion  that  we  possess  one  basic  feasible  solution  X.  Define  C  to  he  the 

costs  associated  with  the  current  basis  vectors  in  order.  Define  the 
*X  *  T 

vector  7  ~  C  G.  Torn  Z  =  C  -  Z.  There  arc  three  possible  cases  lo 
consider. 
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k  > 

J.  If  7  -  0  for  nil  k,  then  X  :s  optical . 

2.  7f  for  some  k  (not.  corrernondinp  to  one  of  the  nosit ivc 

olorenf:-;  in  X)  7  *’  fl  and  G  >  0  for  some  i,  then  '->•  replacement  of  a 

i  k. 

vector  in  the  basic  feasible  solution,  a  better  value  of  the  objective 
function  nay  be  obtained.  Th.3  vector  to  come  into  the  basis  is  clioson 
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One  should  note  that  this  change-  of  basis  vectors  is  exactly  the  same 


as  the  process  involved  in  Gaussian  Elimination*  In  fact,  Causni/ai 
Elimination  is  simply  a  process  of  changing  basis  vectors,  r.l  though  it 
is  seldom  presented  as  such. 


3.  If  for  some  k  Z^  <  0  and  •-  0  for  all  i,  i.  -  .1  , . .  .  ,p 
then  there  exists  no  lower  bound  for  Z  and  the  nroVlcn:  has  no  optimal 


ol  ut.  ion. 


The  iterative  procedure  is  then  continued  with  the  new  basis  until 
either  an  optimum  solution  is  found,  or  the  non-existence  of  a  mini¬ 
mum  is  determined. 

V'e  have  assumed  that  the  user  has  available  an  initial  feasible 
solution.  In  very  few  problems  will  this  be  the  case.  Therefore, 
the  user  will  be  faced  with  determining  a  starting  point  for  the  sim¬ 
plex  algorithm.  A  method  called  the  method  of  artificial  variables 
has  been  developed  to  solve  this  problem.  The  method  consists  of  append¬ 
ing  to  the  right  of  the  G  matrix  of  the  linear  programming  problem  in 
standard  form  a  unit  matrix.  The  cost  function  is  changed  so  that  the 
costs  of  the  actual  variables  are  zero,  and  the  costs  of  the  new  vari¬ 
ables  introduced,  called  artificial  variables,  is  high,  say  1000  Since 
the  real  variables  cost  nothing,  the  simplex  algorithm  teill  attempt,  to 
drive  the  artificial  variables  (which  initially  constitute  a  basic, 
feasible  solution,  obviously)  out  of  the  basis.  Once  they  have  been 
driven  out,  tiio  cost  function  may  be  replaced  with  the  real  one,  and 
with  the  artificial  variables  now  deleted  from  the  problem,  the  simplex 
algorithm  will  continue  on  to  solve  the  original  problem.  Pitfalls 
which  may  arise  with  this  method  will  not  be  discussed  here.  For  a  dis¬ 
cussion  see  (1). 

A  sample  linear  programming  problem  follows: 
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Maximize 


Z  -  x  +  5v 


Subject  to  5x  +  6y  -  30 

3x  +  2y  -  12 


x  -  0 
y  i  0 


Although  on  the  surface  a  nonlinear  programming  problem  would 
appear  to  be  much  the  same  as  a  linear  programming  problem,  there  are 
in  reality  many  differences,  each  of  which  add  to  the  complexity  of  the 
nonlinear  problem. 

The  first  complexity  is  concerned  with  the  convexity  properties 
of  the  set  of  feasible  solutions.  Since  convexity  will  not  be  covered 
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here  (see  (1)),  wo  will  be  satisfied  to  say  that,  in  the  linear  case,  con¬ 
vergence  to  a  unique  answer  is  guaranteed  (if  there  is  any  optimum  answer) 
by  the  convexity  of  the  set  of  feasible  solutions.  This  convexity  is 
guaranteed  by  the  fact  that  the  constraints  are  linear  inequalities  and 
thus  define  hyperplanes  which  separate  half  spaces  which  are  convex  sets. 
Without  the  proper  convexity  properties,  which  is  a  common  problem  in 
nonlinear  programming,  solutions  leading  to  local  optima  can  occur,  and 
therefore,  the  solution  obtained  may  not  be  the  global  optimal  solution. 
Fortunately,  the  problems  addressed  to  date  in  this  research  do  not  have 
this  difficulty,  but.  it  was  thought  pertinent  at  this  point  to  mention 
that  the  problem  could  occur. 

The  other  principal  problem  with  nonlinear  programming  is  that  in 
many  cases  the  optimum  solution  will  occur  not  at  a  vertex  of  the  feasi¬ 
ble  solution  set,  but  along  an  edge.  You  will  remember  that  the  simplex 
algorithm  depended  upon  searching  vertices.  It  turns  out  that  the  guar¬ 
antee  of  convergence  in  a  finite  number  of  iterations  is  dependent  upon 
this  fact,  since  there  are  only  a  finite  number  of  vertices.  However, 
if  all  possible  boundary  points  are  considered  as  possible  solutions,  then 
there  are  obviously  uncountably  infinite  possible  solutions,  and  a  mere 
search,  even  though  ordered,  cannot  be  guaranteed  to  converge  finitely. 

The  following  will  illustrate  the  nonlinear  problem: 

Min  Z  »  x  4  y 

Subject  to  xy  -  9 

x  -  1 
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Assuming  that  one  is  f at od  with  solving  a  nonlinear  programming 
problem  which  has  a  feasible  solution  set  which  is  convex,  i.e.  a  unique 
solution,  one  method  for  solving  the  problem  is  the  method  of  separable, 
programming.  This  method  requires  that  each  of  the  constraint  functions, 
and  the  objective  function,  may  be  written  as  a  sum  of  functions  of  one 
variable.  Not  all  problems  are  of  this  type,  however,  by  manipulation, 
most  can  be  reformulated  in  such  a  way  that  they  are  separable  (see  (1)). 


Given  a  separable  programming  problem,  it  may  be  reformulated  as  a 
linear  programming  problem  in  the  following  wanner.  For  each  variable. 


that  variable  axis  is  partitioned  arbitrarily  over  the  range  of  con¬ 
ceivable  values  for  that  variable.  If  one  now  defines  A.  such  that 

1 

X  -  Jv  ~  1  and  if  more  than  one  is  positive,  then  there 

can  be  at  most  two  and  these  must  be  adjacent.  The  A^  are  interpolative 
constants,  and  therefore,  if  g^  =  g(X^),  the  separate  functions  in  the 
constraints  may  be  written  as  Jx. rr..  Therefore,  we  are  left  with  a 

*•  i  i 

linear  programming  problem  with  the  A^  as  variables,  subject  to  the 

added  constraints  that  the  A ^  for  each  original  variable  sum  to  one, 

and  the  condition  that,  for  A  corresponding  to  the  same  original  vari¬ 
able,  no  more  than  two  A ^  may  be  positive,  and  these  two  must  he  adja¬ 
cent.  The  method  of  artificial  variables  may  be  used  to  obtain  an 
initial  feasible  solution,  the  same  as  in  normal  linear  programming 
probl oms. 

April  ;  ratio  ns  to  Lea  st  .Square  s  Kstimati  on  Problems 

Giver,  a  least  squares  estimation  problem  with  a  mathematical  model 
AX  =  L'-V  (possibly  the  result  of  linearization  of  a  more  complicated 
model),  we  will  now  attempt  to  apply  nonlinear  programming  to  the  cost 
optimization  of  the  problem  of  finding  a  set  of  .sorvations  which  will 
allow  the  values  of  the  propagated  errors  to  fall  within  certain  bounds. 
Before  dealing  with  specific  problems,  the  problem  will  be  discussed  in 
general . 
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It  has  boon  stated  previously  that  the*  variance- covariance  rci  p's 
Hj,,  for  tiie  adjusted  coordinates  is  given  by  (A  )  ^A)  *.  The*  matrix  A 
is  of  course,  constant.  However,  )  ^  is  a  function  of  several  parameters 
which  influence  the  variance  of  each  observation,  hot  the  set  of  param¬ 
eters  which  influence  the  accuracy  of  the  observation  be  denoted  by 
a  vector  M  .  Then  one  arrives  at  a  nonlinear  progrrum Ing  problem  of  the 

form 

min  C.(M.) 

1  x 


subj  ec.t  to 

(ATy^(M],...,Mn)A)"1  -  B. 

Where  C^(M^)  represents  the  cost  of  making  the  i*^1  observation,  and 
r-1 

)T  '('1 . , . . .  ,M  )  renresents  the  inverse  of  the  variant  c-covariance  matrix 
he  i  n 

written  as  a  function  of  M.  ,...,M  .  The  reader  will  notice  that,  in  order 

I  n 

to  arrive  at  the  constraint:  equations,  a  matrix  of  functions  must  be 
inverted.  As  no  straightforward  method  for  doing  this  exists  to  the 
knowledge  of  the  author,  some  sort  of  approximate  method  must  be  cr .ployc-d. 
One  such  method,  which  net  with  rather  limited  success,  was  employed  by 
the  author  in  (1)  and  in  a  paper  given  on  this  same  topic  at  the  19h9 
ACSM-ASP  Convention  in  Washington,  D.C.  (2). 


63 


The  author's  research  efforts  during  the  past  year  have  been  almost 
exclusively  directed  toward  finding  a  better  approximation  to  this  inverse, 
and  is  is  believed  that  the  approximation  which  will  he  presented  will 
alleviate  ail  of  tha  shortcomings  of  the  previous  method,  with  the  added 
advantage  that  it  is  materially  faster  in  convergence  than  the  older 
method. 


The  new  method  is,  strangely  enough,  based  upon  a  common  error 
committed  in  the  propagation  of  variance-covariance  matrices  for  least 
squares  estimators  involving  nonlinear  functions.  This  is  to  assume 
that  the  parameter  values  obtained  by  using  only  a  rough  estimate  of 

. the  true-var lance-covariance  matrix  of  the  observations  is  correct  and 

to  propagate  using  this  information.  Thus  it  is  assumed  that 


where  )  is  the  approximation  to  the  variance-covariance  matrix.  Froni 

statistical  theory  we  know  that,  if  I’Q  =  T  and  the  variance-covariance 
matrix  of  Q  is  7  then 

*-qq 


Therefore, 
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to  four  significant  figures. 


Consideration  of  the  above  phenomenon  brings 
solution  fct  the  method  to  be  used  lo  approximate 

matrix  (A  A).  First  one  makes  a  good  guess  as 

values  of  the  variables  v/ill  he  (it  turns  out  that 


to  mind  a  possible 
the  inversion  of  the 

to  what  the  optimal 

convergence  may  be 
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achieved  lor  almost  any  guess,  so  the  actual  values  of  the  rugs.*!  are 
unimportant) .  Then  one  solves  the  nonlinear  programming  problem,  using 
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A  'Ll/  L1/  LL“  V‘l  -L1.A) 


-1 


as  the  left  hand  side  of  the  constraints  (with  £ ,  of  course,  written 

1  »L» 

as  :i  matrix  of  functions).  The  solution  vector  from  this  iteration  is 
then  used  to  form  a  new  T  for  the  next  iteration. 

In  practice  it  hns  been  found  that  the  algorithm  obtains  convergence 
for  most  problems  within  four  to  five  iterations.  The  method  has  never 
failed  to  converge  on  a  problem.  A  detailed  discussion  of  how  the  method 
can  actually  be.  used  on  n  rather  small  survey  network  will  follow  in  the 
next  section. 

Appl ice. tion  to  Several  Small  Survey  Problems 

It  was  decided  to  try  the  method  on  several  small,  horizontal  control 
survey  networks  combining  both  direction  and  distance  measurements.  There 
is  nothing  particularly  special  about  this  problem;  it  simply  presents  a 
problem  of  small  enough  magnitude  to  be  bandied  without  sophisticated  pro¬ 
gramming  methods,  and  yet  one  with  enough  complexity  so  as  to  nose  some 
challenge  to  the  method. 

Using  regression  analysis  techniques,  a  model  for  the  variance-eovari 
anco  matrix  for  a  station  adjusted  set  of  directions  as  a  function  of  the 
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i'.  .  i.u  r  of  pointings  made  on  each  point  vas  derived.  The  variance  of  tin 


average  of  p.  readings  on  a  distance  ip  c  /n,  where  o  is  the  variance  of 
one  lending,  po  no  approximate  Mathematical  model  was  needed  for  distance: 


Using  these  models,  one  begins  bv  giving  the  nonlinear  programming 
program  guesses  at  the  number  of  times  each  measurement,  is  made,  a  set 
°f  required  variances  for  chosen  adjusted  quantities,  a  cost  vector  tell¬ 
ing  the  cost  of  each  repetition  of  each  separate  type  of  measurement,  and 
an  observation,  or  partial  derivative,  matrix  for  the  least  squares  adjust 
ment  (referred  to  above  as  A). 


I he  program  then  returns  the  number  of  times  each  observation  should 
be  made,  and  the  true  variance-covarinnee  matrix  if  the  observations  are 
made  that  number  of  times __  _  .  . 

At  present  there  is  no  provision  for  using  a  nonlinear  cost  function 
in  the  program,  although  such  provision  could  easily  be  made.  For  pur- 
po  s  t.s_o f  te s t i ng  c he  me the d,  such  a  cost  function  _ was  -  d ccmed  nnncccs so  r y . 

Four  problems  were  run,  and  each  will  he  discussed  in  detail  below: 

PROBLEM  1 


A  »  f  .71  .71  0  0  | 

-.71  .71  0  0  | 

-1.  0.  1  0  j 

0.  0.  0  J  j 

It.  ■  coordinate.-'  i-  ;v;  given  in  terms  of  a  north,  east  coord  inn  it* 

system  in  meters.  hi stance  observations  numbered  1.  2,  3,  and  4  were 

mode.  On  nil  runs  it  *>\:s  required  that  the  variances  of  the.  north  and 

east  coordinates  for  both  unknown  points  (the  points  on  the  light)  be. 

under  .5  meters.  It  was  assumed  that  the  variance  of  a  single  distance 

2 

observation  was  one.  meter'. 

Kith  a  cost  vector  C,  »  cost  of  each  individual  observation  on 
distance  J,  etc.,  equal  to  (1,1, 1,1)  it  was  found  that  tc»  meet  accuracy 
requirements,  distance  1  must  he  measured  4  tines,  two  5  times,  three  5 
times  and  four  4  times,  with  total  cost  18. 

With  C  =  (1.1,10,10),  i.c.,  distance.1;  3  and  4  being  10  times  more 
cost-]/  to  observe  than  1  and  2,  we  find  the  number  of  necessary  observa¬ 
tions  to  be  5, 5, 5, 4  with  total  cost  100. 

With  C  -  1,1,10,1  we  get  5, 5, 5, 4  with  total  cost  64  and  with 
C  -  10,10,1,1  v;«  get  5,3,15,4  with  total  cost  95. 

The  above  numbers  are,  in  all  cases,  rounded  to  the  next  highest 
even  observation,  and  therefore,  some  minor  changes  do  not  appear.  It 
is,  however,  evident  that  even  for  this  very  simple,  problem,  the  optimum 
would  not  be  achieved  by  measuring  all  distance.-,  the  same  number  of 
times.  It  should  te  noted  that,  since  dis tune,  four  is  the  only  obser¬ 
vation  determining  the  cast  coordinate  of  the  upper  unknown  point,  that. 
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it  must  be  made  the  same  nuebt-r  of  times  ne  matter  what  it. 
turns  out  to  bo  true  in  the  results.  Also,  it  .should  be  noted  lion 
the  cost  of  the  first  tv; a  observations  increase  ,  the  minKer  of  »  imes 
they  are  to  he  measured  decreases.  Why  this  d •  r,  not  apply  to  the  third 
observation  to  the  same  extent  is  not  known;  however ,  small  rh.'.ngeK 
which  arc  masked  by  the  rounding  up  of  the  answers  did  appear. 

Problem  2  was  a  very  simple  triangulation  problem.  It  consist. -d 
of  two  known  stations  and  one  unknown  station,  with  three  angles  observed, 

(Socc;  feoc) 


/iv 


S\ 


© 

(^  e(30co  loeo) 

!*3)  / 

X 

(100  0,1000) 


A  =• 


°  -3 

-.71x10  ’ 

-.71x10* 3 

0  -3 

.71x10  „ 

-1 

0 

0 

-1 

0 

0 

-.  71xl0-3 

0 
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-1 
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with  the  last:  three  columns  coming  from  the  unknown  orientation  of  the 

sets  of  directions.  With  the  requirement  that  Lite  variances  of  the 

>  2 

coordinates  of  the  unknown  point  be  loss  than  . 5  x  10  meters  and 

2  2 
C  ~  1,10,I>  with  o  for  a  single  observation  equal  to  30  secs  we  got 

the  number  of  observations  to  be  50 ,16,15  respectively  with  total  cost 

285.  With  C  =  1,1,10  we  get  3G,;5,1,  total  cost  71.  With  C  1,100,10 

we  get  50,15,25,  total  cost  3800.  With  C  =  1,1,1.  we  get  25,37,3.5, 
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total  cost  65.3.  The  effect  of  changing  the  relative  costs  upon  the 
opt  i:.u::n  solution  is  easily  seen  here.  It  must  be  remembered  that  all 
of  the  above  combinations  of  measurements  meet  the  accuracy  require¬ 
ments,  but  that  the  only  differences  were  in  the  relative  costs  asso¬ 
ciated  with  each  measurement.  It  is  easily  seen  that  measurement  costs 
can  greatly  influence  the  configuration  of  the  optimum  solution. 

Problem  3  combined  direction  and  distance  observations. 


-Jrp  t— 1  e>it  2  «*•> 

2 

The  variance  of  a  single  direction  observation  was  3  sec  ancl  the  vari- 

2 

ante  of  a  single  distance  observation  was  1  meter  .  The  variances  on 

-2 

north  and  east  of  the  unknown  point  .1  x  10 

With  C  =  1,1, 1,1,1  the  optimum  solution  was  15,1,1,1,1,  total  cost. 
19.  With  C  =  10,5,5,1,1  the  solution  was  15,1,21 ,1,1,  total  cost  167. 

With  C  -  200,100,1001,1  the  solution  was  14,1,1,25,25,  total  cost  3050. 
It  is  obvious  from  these  results  that  the  directions  at  the  unknown 
point  are  the  most  critical  observations.  Even  when  they  cost  200  times 
as  much  as  the  distances,  they  must  still  be  made  14  times  for  an  op¬ 
timal  solution.  Also,  it  should  be  noted  that,  especially  l’or  this 
problem,  the  direction  observations  lend  much  more  strength  to  the  net¬ 
work  than  the  distance  observations. 


(M- IO0C ) 
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Problem  4  v;as  a  scmehwat.  more  coup  lex  combined  angle  ana  distance' 
measure  ax-.nt  problem. 


The  only  condition  enforced  was  that  the  variances  of  the  north  and  east 

o 

coordinates  ol  the  far  right  hand  point  be  less  titan  .001  meters*’",  Iho 

2 

variance  for  a  single  direction  was  taken  as  3  seconds  With 
C  *  1,1,1 ,1,1,1 ,3  ,3.  we  get  5, 5, 5,1, 1,1,5,17,  total  cost  40.  With 
c  *  1,1,1,1,1,1,100,100  we  get  5,5,50,1,1,1^.13*  total  cost  1 454. 

With  C  =  1,1, 4, 1,3,1, 100, 100  we  get  35,15,15,1,1,1,1,14,  total  cost 
1593.  The  influence  of  the  cost  vector  upon  the  optimal  solution  is 
obvious  in  this  case. 

The  reader  will  notice  that  the  total  cost  is  increased  rather  sig¬ 
nificantly  by  file  addition  of  cost  constraints.  This  is  logical,  since 
in  most  cases,  the  cost  of  the  most  criLieal  measurement  was  increased, 
and,  since  this  measurement  was  required  essentially  without  respect,  to 
cost,  the  total  cost  increases. 

It  should  be  mentioned  here  that  we  did  not  allow  the.  values  of 
the  variables  to  go  to  zero,  si.  ce  we  incorporated  no  provision  for 
handling  zero  weights  (or  infinite  variances).  Also,  since  our  pcl.yucniinal 
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approximation  ended  at  50,  no  value  could  exceed  50.  In  only  one  case 
does  the  restriction  of  50  enter  in;  however,  the  restriction  that 
the  variables  could  not  be  less  than  1  entered  into  several  problems. 

A  possible  solution  to  this  is  discussed  in  the  conclusions. 

As  a  comparison,  for  problem  4  the  accuracy  specified  was  about 
one  part  in  100,000.  If  we  assumed  C  =  1,1,1 ,1 ,1,1,1 ,1  and  used  a 
set  of  specifications  telling  us  to  measure  each  direction  16  times 
(8  sets  of  both  direct  and  reverse)  and  to  measure  each  distance  4 
times,  the  total  cost  would  be  104,  vs  a  total  cost  of  40  for  the  op¬ 
timal  solution,  but  worse  than  that,  the  results  of  the  survey  would 
not  meet  required  accuracy.  This  example  should  make  the  advantages 
of  this  method  obvious  to  the  reader. 

Discuss  ion  of:  App  1 1  ca  1  :i  ons  to  More  General  Prob loias 

The  method  can  be  extended  rather  simply  to  general  problems. 

For  problems  involving  trade  offs  between  various  possible  methods,  one 
could  simply  set  up  the  problem  as  if  all  observations  possible  were 
being  made  and,  if  the  variance  functions  are  picked  so  that  if  an  ebser 
vation  is  not  made,  it  ’..’ill  have  zero  weight  in  the  adjustment,  then  a 
least  cost  solution  would  allow  some  measurements  not  to  be  made  at  all. 
Thus  the  method  could  be  used  as  a  decision  tool  as  to  which  measure¬ 
ments  should  be  made,  or  where  control  points  should  be  placed,  etc. 
Ideally,  in  this  mode,  integer  programming  techniques  should  be  used, 
so  that  a  binary  (0,1)  possible  range  of  answers  would  be  possible. 

This  ’would  complicate  the  programming  solution  somewhat,  but  it  is 
certainly  feasible. 
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Another  possibility  is  that  the  method  could  be  applied  separately 
to  each  of  two  or  more  competitive  systems  to  find  out  the  minimum  cost 
of  attaining  a  given  accuracy  with  each.  It  would  then  be  simple  to 
see  which  system  would  be  least  costly  for  accomplishing  a  given  purpose. 
This  application  could  save  considerable  expense  by  eliminating  the 
complete  development  of  two  computing  systems  when  one  is  found  to  be 
'  considerably  less  cost  effective  than  the  other. 

It  should  be  realized  at  this  point  that  the  computer  programs 
used  to  apply  this  method  to  these  larger  problems  will  be  rather  large 
and  involved.  Therefore,  it  is  not  envisioned  at  this  time  that  this 
method  will  be  used  to  obtain  an  optimum  design  for  each  individual 
project,  but  only  to  arrive  at  an  optimum  design  for  a  typical  project 
of  a  class.  The  other  projects  in  this  class  could  be  accomplished  in 
a  near  optimum  manner  using  the  experience  gained  from  the  test  project. 

Conclusion 

As  can  be  seen  from  the  sample  problem  in  the  last  section,  the 
method  works  relatively  well  for  small  problems.  The  difficulties 
encountered  in  completely  deleting  measurements  are  being  eliminated 
^  by  using  integer  programming  techniques. 

The  method  has  been  shown  to  be  feasible  for  ground  survey  pro— 

*  blems.  The  author  believes  it  to  be  feasible  for  larger  and  more  general 

classes  of  problems  both  in  the  mapping  field,  and  in  other  areas  of 
interest.  Of  course,  the  actual  application  of  the  method  may  have  to  be 
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tailored  to  the  ind  i  i  dual  problem;  however,  the  general  concept.*; 
will  apply. 

It  is  hoped  that,  in  the  next  year,  larger  problems  can  be 
attacked  and  the  feasibility  of  the  method  for  totally  different  pro-- 
b Ion  a  can  be  studied.  The  author  would  like  to  apply  the  method  to 
such  tilings  as  optiv.nl  location  of  control  for  phot ogrcmmctric  block 
adjustments,  confirm r.ti on  of  navigational  satellite  systems,  and  other 
problems . 

If  the  author  can  be  of  any  assistance  to  anyone  else  i  iterostod 
in  working  in  this  area,  he  would  be  more  than  willing  to  discuss  the 
matter  further. 
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ON  THE  RATE  OF  CHANGE  IN  THE  SOLUTION 
SET  OF  A  PERTURBED  LINEAR  PROGRAM 

Stephen  M.  Robinson 
Mathematics  Research  Center 
University  of  Wisconsin 
Madison,  Wisconsin 

Abstract .  In  this  paper  we  find  bounds  for  the  displacement  in 
the  solution  set  of  a  system  of  linear  inequalities  caused  by  pertur¬ 
bations  in  the  coefficient  matrix  and/or  the  right-hand  side,  and  apply 
these  to  estimate  the  error  in  the  optimal  solution  of  a  perturbed 
linear  program.  The  main  result  is  that  if  a  superconsistent  linear 
program  is  subjected  to  a  sequence  of  perturbations  approaching  zero, 
for  which  a  uniformly  bounded  sequence  of  (primal)  solutions  to  the 
perturbed  program  exists,  then  the  distances  from  those  solutions  to 
the  solution  set  of  the  unperturbed  program  are  of  the  large  order  of 
the  perturbations, 

1.  Introduction, 

It  is  known  that  under  certain  regularity  conditions  the  solution 
set  of  a  linear  program  (and  of  much  more  general  programs)  is  upper  semi- 
continuous  under  perturbations  in  the  constraints  and/or  the  objective  func¬ 
tion  ([1],  [2],  [3]).  Here  "upper  semicontinuous"  is  to  be  interpreted  in 
the  sense  applicable  to  set-valued  mappings  [1],  However,  estimates  for 
the  changes  in  the  solution  set  under  such  perturbations  seem  not  to  have 
been  given.  In  this  paper  we  develop  such  estimates.  We  proceed  by  first 
developing  bounds  for  the  displacement  in  the  solution  set  of  a  system  of 
perturbed  linear  inequalities,  then  apply  these  bounds  to  the  case  of  a 
linear  program  to  obtain  a  bound  for  the  distance  from  a  point  solving  the 
perturbed  program  to  the  solution  set  of  the  unperturbed  program.  Since 
this  bound  is  expressed  in  terras  of  constants  which  are  shown  to  exist  but 
which  may  not  be  readily  computable,  it  is  not  likely  to  provide  computable 
error  estimates.  However,  it  does  permit  us  to  draw  conclusions  about  the 
rate  at  which  the  solutions  approach  those  of  the  unperturbed  program. 

These  conclusions  are  summarized  in  Theorem  7,  where  we  show  that  if  the 
unperturbed  program  is  superconsistent  and  if  there  is  a  sequence  of 
primal  solutions  to  the  perturbed  programs  which  can  be  uniformly 
bounded  as  the  perturbations  approach  zero,  then  the  distances 
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from  those  solutions  to  the  solution  set  of  the  unperturbed  program  are  of 
the  large  order  of  the  perturbations  in  the  coefficient  matrix,  right-hand  side, 
and  objective  function. 

In  the  course  of  obtaining  these  results,  we  develop  some  properties 

of  linear  inequalities  which  may  be  of  independent  interest;  in  particular  we 

show  that  each  real  matrix  defines  a  number  which  behaves,  with  respect  to 

systems  of  inequalities  involving  that  matrix,  much  as  the  norm  of  the  inverse 

of  a  non-singular  square  matrix  does  with  respect  to  systems  of  equations. 

We  shall  use  upper  and  lower  case  Roman  letters  for  matrices  and 

vectors  respectively,  and  lower  case  Greek  letters  for  scalars.  1R  denotes 

k-dimensional  real  linear  space;  ||  ||  and  |j  ||,  are  used  for  the  t 

00  1  00 

and  norms,  respectively  (see,  e.g.,  [4]).  The  symbol  e  is  used  for  a 

T 

vector  v/ith  1  in  every  component.  The  transpose  of  x  is  written  x  . 
Subscripts  on  vectors  and  matrices  refer  to  components,  except  at  the  end  of  the 
paper  where  they  indicate  membership  in  a  sequence;  there  should  be  no 
confusion  as  to  which  is  intended  in  any  particular  place, 

2.  Linear  Inequalities. 

In  this  section  we  present  various  results  about  linear  inequalities, 
some  of  which  we  shall  have  occasion  to  use  later  on.  We  first  show  that  any 
real  matrix  defines  a  number  with  the  properties  mentioned  in  Section  1.  The 
term  "polyhedral  norm"  means  any  norm  whose  unit  sphere  is  a  polyhedron. 
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LEMMA:  Let  A  be  an  m  X  n  real  matrix  and  let  ||  ||  and 

-  -  -  jTj  - 

II  ||  be  polyhedral  norms  on  Fm  and  Rn  respectively.  Then  there  is 
a  scalar  p(A),  depending  only  on  A.  such  that  for  each  bt  Rm,  either 

a.  Ax  f  b  has  no  solution,  or 

b.  Ax  5  b  has  a  solution  x  with  ||x  ||  5  p(A)  ||b  ||m  . 

Further,  there  exists  a  be  lRm  such  that  Ax  <  b  is  consistent  and  for 

every  solution  x  we  have  llx  II  >  u(A)  lib  II 
-  -  n  =  m 

PROOF:  Let  P  and  P  be  the  unit  polyhedra  of  ||  ||  and 
m  n  m 

II  ||n  respectively.  Consider  the  set  B  :=  {b  |  be  and  Ax  <  b  is 

solvable}.  It  is  easy  to  see  that  B  =  A(Rn)  +  R™  =  A(R^)  +  (-A)(r")  +  R™, 

J'  Jc 

where  R+  is  the  non-negative  orthant  of  R  .  Thus  B,  being  the  sum 

of  three  polyhedral  convex  cones,  is  itself  a  polyhedral  convex  cone.  The 

intersection  of  B  with  the  unit  polyhedron  P  :  =  {b  |  be  Rm  and  ||b||  <1} 

m  m  ~ 

is  nonempty  since  Oe  P  fl  Bt  thus  P  fl  B  is  a  convex  polyhedron,  and 
so  is  the  convex  hull  of  a  finite  number  of  extreme  points,  say  {p^,  . . . ,  p^}. 
Now  consider  the  function  rj(q)  defined  on  B  by  t|(q)  :  =  min{r|  |  r|  >  0 
and  for  some  x,  Ax  <  q  and  xe  tiP^}.  This  function  certainly  exists  for 
any  qe  B,  since  if  (x',  V)  is  any  feasible  pair  then  t|(q)  can  be  found 
by  minimizing  r\  over  the  compact  set  {(x,  r|)  |  Ax  <  q,  xc  riP^} 

A  {h'Pn  X  f°>  Tl* ]} *  *n  addition,  T|(q)  is  convex  on  B.  To  see  this,  let 
ql»  q2f  8  and  suppose  =1^,  r|(q2)  =  r^.  Let  Az][  <  q1  with 

HZ1  lln  ~  hfi  and  Az2  -  q2  ^Z2^n=t1Z*  ^ow  choose  any 
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x.  t  [0 ,  1 1  and  consider  the  point  q^  :  =  Xq^  +  (l-X)q^.  Let  :  =  Xz^  + 

+  (l-X)z  then  clearly  Az^  =  XAz^  +  (1  -  X)Az^  5  +  (1  -  X)q^  =  Q-j, 

and  ii z 3  iin  -  ||Xz1  +  (1  -  X)z2  Jj^  <  X  ||zj  j|n  +  (1  -  X)  ||z2  |ln  =  X^  +  <1-X)tj2. 

Thus  r|( q  )  <  || z  ||  5  X rj( q . )  +  (1  -  X)n(q  ),  which  proves  the  convexity 

3  *■  j  n  —  a  l 

of  rj.  Consider  the  restriction  of  n  to  P  fl  Bt  since  fl  B  is 

the  convex  hull  of  p,,  ...  ,  p  and  since  a  convex  function  on  a  convex 

r  r 

polyhedron  attains  its  maximum  at  one  of  the  extreme  points  (because  any 
point  in  the  polyhedron  is  a  convex  combination  of  the  extreme  points),  it 
follows  that  for  each  qt  fl  B  we  have  q(q)  <  iq )  :  = 
max(t](p.)  i  I  <  i  <  r}  =:  p.  . 

Now  let  bt  jRm.  If  Ax  5  b  has  no  solution  then  conclusion  (a. ) 

holds;  thus,  let  Ax  <  b  be  solvable.  If  b  =  0  then  x  =  0  is  a  solution 

and  || x  II  <  p  ||b  ||  .  If  b  *  0,  let  bi=b/||b||  ;  then  be  P  fl  B,  so 

there  is  an  x  such  that  Ax  <  b  and  ||x  |(n  <  p.  Multiplying  by  ||t>  lfm , 

we  find  that  with  x  :  =  ||b  I!  x  we  have  Ax  <  b  and  ||x  ||  <  p  ||b  ||  , 

m  =  n  =  m 

Thus  conclusion  (b. )  holds  if  conclusion  (a. )  does  not. 

Finally,  consider  p^.  If  p^  =  0  we  have  p  =  0,  and  clearly  any 

solution  of  Ax  <  0  has  ||x  II  >  0  =  p  ||pw  ||  .  Assume  p...  *  0  and 
=  n  =  Mm  M 

0  <  ||p.,  ||  <1.  Let  X  t  =  1/(1  +  Up..  ||  );  then  0  <  X  <  1.  Both 

Mm  Mm 

rl  :  =  ^PM  ^mPM  and  r2  '  =  ^PM  ^inpM  lle  in  B  (because  11  is  a  cone)  ” 

and  in  P  (because  ||r.  II  <  1  =  ||r_  ||  ),  and  it  is  easily  seen  that 

m  i  m  Z  m 

p..  =  ^r,  +  (l-X)r,  which  contradicts  the  fact  that  p,,  is  an  extreme 
Ml  2  M 

point  of  P  fl  B.  Thus  |jp. ,  IS  =  1.  We  know  that  for  each  x  with 
m  M  m 

Ax  <  pM  we  have  ||x  Iln  >  p,  but  since  ||p^  |(m  =  1  we  have  ||x||n  > 

>  p  lip. .  II  .  This  completes  the  proof. 

=  M  m 
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We  remark  that  in  the  case  of  non- polyhedral  norms,  the  first 

conclusion  of  the  lemma  remains  valid  since  any  two  norms  on  a  finite - 

dimensional  space  are  equivalent. 

For  the  remainder  of  this  paper  we  shall  iet  |J  jj  and  |j  || 

r  m  n 

be  the  t  norm  on  the  spaces  and  !Rn  respectively. 

To  indicate  one  application  of  this  lemma,  we  mention  that  it  may 
be  used  to  obtain  bounds  for  the  dual  variables  of  linear  programs  in  terms  of 
the  primal  variables  and  the  objective  function,  as  shown  in  the  following 
theorem. 

THEOREM  1:  Let  A  be  an  m  Xn  matrix  and  let  bt  JRm.  Suppose 

{x  |  Ax  <  b}  4  0.  Then  there  is  a  constant  p,  depending  only  on  A  and  b, 

such  that  for  every  p«  Rn  and  every  optimal  solution  x  (if  any  exists)  of 

T 

the  program  min{p  x  I  Ax  <  b) ,  there  is  a  corresponding  dual  solution 

— 

u«  R  with 

Hull*  <P  lip  ll^maxd,  ilxllj)  . 

PROOF :  The  vector  u  is  a  dual  solution  of  the  cited  program  if 
and  only  if  it  satisfies 

uT[A  b]  =  -p^[I  x] 

and 


u  >  0  . 
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By  hypothesis,  x  is  a  primal  optimal  solution,  so  at  least  one  such  u 
exists.  Rewriting  this  system  as 

[A  b  -A  -b  -IjTu  <  [-pT[I  x]  pT[I  x]  0]T  f 

letting  p  equal  the  number  p,  associated  by  the  preceding  lemma  with  the 

matrix  on  the  left-hand  side  and  estimating  the  I  norm  of  the  right-hand 
side  by  ||p  H^maxfl,  ||x  Hj),  we  find  that  a  dual  solution  u  will  exist  with 

||u||  f  p ||p  Hoomax(l,  ||x  Ih  ).  This  completes  the  proof. 

Unfortunately,  the  above  result  is  useful  only  in  case  A  and  b 
remain  constant.  Since  we  are  interested  in  the  case  in  which  all  three  of 
A,  b  and  p  may  vary,  we  shall  want  to  find  out  something  about  the  behavior 
of  p.(A)  as  A  varies;  in  particular  we  should  like  to  know  whether  (and  when) 
pi(A)  is  continuous  in  A. 

It  is  immediately  clear  that  p(A)  is  not  always  continuous;  for 

example,  consider  the  set  of  1  X  1  real  matrices.  We  have  p.([0])  =  0, 

but  for  a  ?  0  p([a])  =  |or|  ,  so  pi  is  not  continuous  at  0.  This  situation 

is  analogous  to  the  problem  of  singularity  for  square  matrices;  to  resolve 

it  we  introduce  a  requirement  analogous  to  that  of  full  rank. 

DEFINITION:  We  say  that  a  matrix  A  satisfies  the  PLI  condition 

and  only  if  the  rows  of  A  are  positively  linearly  Independent;  that  is . 

T 

u  A  =  0  and  u  >  0  implies  u  =  0. 
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THEOREM  2:  Let  A  be  an  m  X  n  matrix.  The  inequalities  Ax  <  b 
are  solvable  for  every  right-hand  side  be  F™  if  and  only  if  A  satisfies  the 
PLI  condition. 

PROOF:  Follows  directly  from  Gordan's  theorem  [5], 

If  A  satisfies  the  PLI  condition,  then  not  only  is  fi(A)  continuous 
at  A  but  we  can  even  find  estimates  for  its  variation  near  A.  These  are 
given  in  the  following  analogue  of  Banach's  lemma  for  nonsingular  linear 
transformations. 

THEOREM  3:  Let  A  be  an  m  X  n  matrix  satisfying  the  PLI  condition. 

Then  for  every  ra  X  n  matrix  H  with  jj.(A)  jj H  ||  <  1,  we  have: 

a.  A  +  H  satisfies  the  PLI  condition,  and 

b.  *i(A)(l  +  n(A)  IlHlIj"1  <  n(A  +  H)  <  |x(A)(l  -  n(A)  ||h  IIj"1  . 

PROOF:  By  Theorem  2  and  the  definition  of  p.(A) ,  we  can  find  an 
x  such  that  Ax  <  -e  and  ||x  <  jx(A).  Now  assume  that  n(A)  ||H if^  <  1; 

then  (A  +  H)x  =  Ax  +  Hx  <  -e  +  )| H  11^  Hxlj^e  <  -(1  -  ^(A)  ||H  H^le  <0,  so  by 

Gordan's  theorem  [5]  the  matrix  A  +  H  satisfies  the  PLI  condition. 
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Now  let  b €  Rm  be  arbitrary.  As  above,  there  is  some  x  such  that 
Ax  <  b  and  ||x  <  n(A)  lib  j^.  Then 

(A  +  H)[x  +  (1  -  n(A)  II H  WjW)  ||H  lljlb  \\J]  <  b  +  Hx  + 

+  (1  -  n(A)  IIh  11^)  V(A)  ||H  11^  lib  Uj-fl  -  fi(A)  II H  11^)6]  < 

<  b  +  IIh  IIooh( A)  lib  11^  -  |i.(A)  ||h  1^  ||b  11^  =  b  . 

Further, 

llx  +  (1  -  ja{A)  IIh  llB»-V(A)  IIh  11^  ||b ll^  <  ||b  ||b  + 

+  a  -  k(A)  ||h  Uj  V(a)2  IIh  ||m  ||b  ||M  =  ,*<a)<i  -  K(a»  ||h  H^f1  ||b  ||m  ( 

so  since  b  was  arbitrary  we  must  have  n(A  f  H)  <  ,jl(A)(1  -  ^(A)  ||h  ||  f1  . 

=  'oo'  * 

Again  let  be  Rm  be  arbitrary.  Choose  y  so  that  (A  +  H)y  <  b 

with  II y  11^  <  K(A  +  H)  lib  Ilw,  and  consider  the  vector  y  +  n(A  +  H)  ||h  ||  ||b  ||  x 

00  00  * 

in  which  x  is  as  previously  defined.  Then  we  have 

A(y  +  k(A  +  H)  ||H  II  lib  II  X) 

OO  oo 

<  (A  +  H)y  -  Hy  -  p,(A  +  H)  ||h  ||  ||b  ||  e 

00  00 

<  (A  +  H)y  <  b, 

and 

Hy  +  ^(A  +  H)  l,H  L  *lb  L*  L  =  *<A  +  hm  +  ^(A)  IIh  ||  )  lib  || 

oo  oo  * 

so  n(A)  <  K(A  +  H)(1  +  m>(A)  ||h  Hj,  or  K(A)(i  +  n(A)  ||h  II  j'1  <  h(a  +  H). 

This  completes  the  proof. 
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Consideration  of  the  1  x  1  matrix  A  =  [1]  and  the  class  of  matrices 
H  =  {« ],  fc  |  <  1,  shows  that  the  above  bounds  are  sharp. 

We  now  turn  our  attention  from  the  study  of  p(A)  to  consideration 
of  perturbed  systems  of  linear  inequalities.  The  theorem  we  obtain  here  will 
be  our  main  tool  in  studying  the  behavior  of  perturbed  linear  programs. 

Consider  a  system  of  linear  inequalities  Ax  <  b,  and  assume  this 
system  has  at  least  one  solution.  We  shall  consider  another  solvable  system 

/S 

of  the  same  dimensions,  Ax  <  b  ,  and  pose  the  following  problem:  Given  a 

A  A 

solution  x  to  the  second  system,  find  a  "small"  I  sphere  about  x  such 
that  a  solution  of  the  first  system  must  lie  within  that  sphere.  Finding  the 

A 

smallest  such  sphere  is  equivalent  to  determining  the  1  distance  from  x 
to  the  solution  set  of  the  first  system.  This  problem  is  solved  by  the  following 
theorem. 

A 

THEOREM  4:  Let  A  and  A  be  m  X  n  real  matrices,  and  let  b 

A  m  _  _ 

and  b  be  vectors  in  R  .  Suppose  x  is  such  that  Ax  <  b.  Then  there 
exist  constants  p  and  v  depending  only  on  A,  b  and  x,  such  that  for 

A  AA  A 

each  x  with  Ax  <  b  there  exists  an  x  with  Ax  <  b  and 

llx  -  X  ||  ^  <  (p  +  v  llx  -  X  ||  ||a  -  A II  J|x  II  ^  +  lib  -  b  ilj  . 

PROOF:  Since  x  is  a  feasible  point  (solution)  for  the  inequalities 
Ax  <  b,  it  follows  that  b  -  Ax  >  0.  Define  1^  :  =  {i  |  (b  -  Ax^  >0} 

(sometimes  called  the  carrier  of  b  -  Ax)  and  I  :  =  {1,  ...,  m}\l.  = 

{i  I  (b  -  Ax).  -  0}.  Rearrange  the  rows  of  A,  A,  b  and  b,  if  necessary, 
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so  that  all  the  members  of  I  precede  all  the  members  of  I..  Thus  we 

1  ^ 

a  A 

obtain  a  partitioning  of  A  and  b  (as  well  as  of  A  and  b)  as  follows: 


<—  — - 

A. 

—  — i 

b. 

1 

1 

A  = 

,  b  = 

A, 

b_ 

2_j 

L  2w 

or  I  ^  could  be 
empty,  in  which  case  no  partitioning  would  be  necessary. 

Now  consider  the  linear  program: 


where  bj  >  A^x  and  b2  =  A^x.  Of  course,  either  Ij 


Find  y  and  e  to  minimize  «  *1 


subject  to  -e  e  <  (x  +  y)  -  x  <  «  e  l 
A(x  +  y)  <  b  . 

J 


(1) 


It  is  clear  that  y  and  some  «  are  feasible  for  (1)  if  and  only  if  x  +  y 
satisfies  A(x  +  y)  <  b.  Consequently,  if  we  can  solve  (1),  the  value  of 
will  yield  the  minimum  of  ||x  -  x  ||  over  all  x  satisfying  Ax  <  b. 

Using  the  partitioning  outlined  above,  the  dual  of  (1)  can  be 
written  as: 


Find  Uj,  u2,  vp  v2 

T  ^  —  X  — 

to  maximize  (u^  -  u2)  (x-x)  -  v^(bj  -  AjX) 

T  T  T 

subject  to  (^  -  u2)  -  VjAj  -  ~  ° 


+  u2)  e  =  1 


ul»  V  vi>  V2  =  0  * 


(2) 
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Both  (1)  and  (2)  are  obviously  feasible;  hence  by  the  duality 
theorem  of  linear  programming  [5]  they  are  both  solvable  and  the  extrema 
are  equal.  Setting  x  :  =  x  +  y ,  we  then  see  that  Ax  5  b  and  that  for  any 

a# 

v^  such  that  Up  u^,  Vp  and  v^  satisfy  the  constraints  of  (2),  we  have 

0  <  ||x  -  x  || w  =  (Uj  -  u2)T(x  -  x)  -  v^(b,  -  AjX) 

=  (Uj  -  u2  -  v^Aj  -  v^A2)(x  -  x) 

+  Vj  (AjX  -  bj)  +  V2^A2^  ■  b2^ 

=  [VjV^KAx  -  b) 

OVwrn  A  A  A  A  A  A 

=  [v^v2H(Ax  -  b)  +  (A  -  A)x  -  (b  -  b)} 

fTWrri  A  A  A 

<  [v1v2][(A  -  A)X  -  (b  -  b)]  , 

/V  A  A 

where  we  have  used  the  constraints  of  (2)  and  the  facts  that  Ax  <  b  and 


V  =  b2* 


-1 


Let  6  :  =  min{(b  -  Ax)^  I  ic  Ij}  >  0.  Since  ||x  - 


00  = 


>  0 


^  “  j 


we  have  (Uj  -  u2)T(x-  x)  >  vJ(bj  -  AjX)  >  6_JvJe,  and  thus  ||v1  ||j 


-T 


A  —  _  A 

v*e  <  6(Uj  -  u2)‘(x  -  x)  <  6  ||x-  x\\j  lluj  II  j  +  l|u2  ||j)  =  6  ||x  -  x  HjUj  +  u2)  e  = 
=  6||x-x||  .  The  only  requirement  that  we  placed  on  v  was  that  it 

satisfied  A2v2  =  (Uj  -  u2)  -  A^v^  ;  by  the  lemma  there  exist  a  constant  n, 
depending  only  on  A2,  and  a  solution  v2  with  llv2  llj  <  p  Iluj  _U2~Alvl^i 
<  h(  Huj  || j  +  Ilu'2  llj  +  ||A1  II^IIvj  ||j)  <  p(l+  6  || Aj  IMIx-  xllj.  Thus,  setting 


v  =  v  ,  we  obtain 

M  C* 
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Ilx  -  X  II,  <  ll[v[  **1  II J  II a  -  X  II  Jli  II*  +  lib  -  b  ll*> 

=  <  f|va  llj  +  llv2llj)(  IIa  -  X  11*11x1*+  lib -b  II*) 

<[,x  +  (1  +  |X||A.  II  )6  Ilx  -  x  II  *](  l|A  -  All  KIIm+  lib  -  big  , 

—  ±  00 


which,  with  v  •  =  (1  +  p.  || 11^)6,  completes  the  proof. 

We  remark  that  if  b  >  0,  then  x  =  0  is  a  choice  that  satisfies 
the  requirements  of  the  theorem.  Also,  in  case  Ax  <  b  is  superconsisten 
and  Ax  <  b,  then  I  =  0  and  the  above  estimate  takes  the  simpler  form 

llx  -  x  11^  <  &  Ilx  -  x  ly  IIa  -  a  ly  Ilx  ly  +  ||b  -  B II  j  , 

since  the  lemma  is  not  required. 


3.  Linear  Programs. 

We  now  apply  Theorem  4  to  the  problem  of  bounding  the  error  in  a 
perturbed  linear  program.  The  result  we  obtain  in  this  case  is  stated  in 
the  following  theorem. 

THEOREM  5:  Consider  the  linear  programs 

min  f  T  i  .  ^  , 

(P  x  |  Ax  <  b}  (3; 


(4) 
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^  A  ^ 

where  A  and  A  are  m  X  n  real  matrices,  and  where  b,  b  and  p,  p  are 
vectors  In  Rm  and  Fn  respectively.  Let  x  solve  (3).  Then  there  exist 
constants  6,  and  <r  depending  only  on  A,  b,  p  and  x,  such  that 
if  x  and  u  are  respectively  primal  and  dual  solutions  of  (4),  and  if  we 
define 

t]  :  =  t,  +  6  ||x  -  x  ||  ^  , 

£:=£,  +  o-  ||x  -  x  || 

00 

®  *=  -n  ||x  11^+  gllulljlxll^  , 
p  :  =  ti  +  t  ||u  Up 

and 

v  :  =  -n  II*  +  *  llx  11^  , 

then  there  is  a  vector  x  «  Rn  such  that  x  solves  (3)  and 


II*  “  x  IIqo  =  a  Ha  -  A 11^  +  P  ||b  -  b  (1^  +  v  Up  -  P  1^  . 

PROOF?  Suppose  pTx  =  X  and  pTx  =  X  .  We  shall  first  obtain  a 

A  A 

lower  bound  on  X  -  X  „  Since  u  solves  the  dual  of  (4), 


max  ,  Tc  i  Tj  aT 
u  {-u  b  |  u  A+  p  =0,  u  >  0}, 


(5) 


,  /sT  AA  a 

we  have  u  (Ax  -  b)  =  0  and 


P  X 


-A  T*/". 

=  -u  b 


~Ta  aTa  at  — 
=  -u  b  +  (u  A+  p  )x 


~T_  - 

=  P  x  +  u  (Ax  -  b), 
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so 


~  X—  ~  T—  ~T-'— 

X-X.  =  px-px=(p-p)x-u  (Ax-b) 

.  ~  .T—  aT  A  *  ^.T 

>  (P  -  P)  X  -  u  (Ax  -  b)  +  u  (Ax  -  b) 

=  (P  -  p)Tx  +  GT[(A  -  A)x  -  (b  -  fi)].  (6) 


We  remark  that  we  could  also  have  obtained  an  upper  bound  in 

terms  of  x  and  a  dual  solution  to  (3).  Extensions  of  this  procedure  to 

convex  programs  are  relatively  easy  to  obtain  via  the  Kuhn- Tucker 
saddlepoint  conditions. 

Now  we  consider  the  two  systems  of  linear  inequalities 
r-  r  ~\ 


A 


b 


x  < 


(7) 


X. 


and 


(8) 


Clearly  any  feasible  point  of  (7)  or  (8)  is  an  optimal  solution  of  (3)  or  (4) 
respectively.  We  now  apply  Theorem  4  to  obtain  the  result  we  are  looking 
for.  Just  as  in  the  proof  of  that  theorem  we  find  that  there  is  an  optimal 
solution  x  of  (3)  satisfying 


‘  (U1  -  U2,T<*  '  x)  '  vJ(b,  -  A,x)  , 


(9) 
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with  optimal  dual  variables  u^  and  satisfying  the  following 
constraints: 


—  —  _T  ~T  ~  T 

ul  -  u2  '  V1A1  •  V2A2  ■  “P  =  ° 


(ul  +  u2)Te 


ul»  U2>  vl*  V2’  w 


=  1 


>  0  . 


(10) 


J 


Here  A  ,  A  ,  b  and  b  are  defined  just  as  in  Theorem  4.  Note  that 

14  1  4 

once  Up  u2  and  Vj  are  fixed,  the  variables  v2  and  u  in  (10)  may 
assume  any  non-negative  values  consistent  with  the  equations  of  (10); 
the  duality  theorem  guarantees  the  existence  of  at  least  one  non-negative 
pair  satisfying  those  equations. 

By  arguing  as  in  the  proof  of  Theorem  4,  we  obtain  the  following 
results,  in  which  6  is  defined  as  previously. 

a.  Hvj  Hj  <  6  ||x  -  x  ||w  . 

b.  There  are  feasible  solutions  v  and  ZJ  to  (10),  and  a  constant 

4 

X  _  _ 

t,  depending  only  on  A„  and  p,  such  that  ||[v,  w]  ||  <  t,  ||u,  -  u  -  A,  v,  ||, 

C  C  ■'00=  1  c  i  1  1 

<  t,(i  +  6  llAj  ll^llx  -  x  11^). 

c.  II x  -  x  <  [Vj  v2  u] 

Using  (a. ),  (b. ),  and  (c. ),  we  obtain  with  G  :  =  (1  +  t,  || A.  ||  )6  and 

1  00 

T)  »  =  C  +  0  ||x  -  X  11^  , 
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(11) 


fix  II  +  lib  -  b  ||  )  -«(\  - \)  . 

00  00 

But  from  (6),  we  obtain 
-«(\  -  \)  <«{||p  -  p  fij  llx  11^+  lluljjdlA-  A||fl0l|x||aD+  llb-bllj},  (12) 
and  we  have 

0  <  Z  <  ||[vT2S]||oe<  4(1  +  6  IIaJIJIx-xIIj  =:  4  +  <r  ||x-  xll^.  (13) 

Putting  together  (11),  (12)  and  (13),  and  defining  4  s  =  4  +  <r  ||x  -  x  jj^  , 
we  obtain 

llx  -  *  11*,  i  ^  H*  +  €  llu  II j  llx  llj  Ha  -  a 

+  (it*-  4  llu  llj)  lib  -  6  +  (n  llx  +  4  ||x  Hj  Up  -  p  1^  , 

which  completes  the  proof. 

The  bound  given  by  Theorem  5  involves  the  dual  variables  u . 

In  some  cases  this  may  be  quite  acceptable,  if  we  know  some  convenient 
bound  on  ||u  ||^.  However,  if  we  have  no  such  information  then  the  presence 
of  u  may  be  unsatisfactory.  In  order  to  eliminate  the  dependence  of  the 
error  bound  on  these  variables  we  must  find  some  way  of  bounding  them 
in  terms  of  the  primal  variables  and  the  other  parameters  of  the  problem. 
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One  way  to  do  this  would  be  to  apply  Theorem  1  to  bound  the  individual  u's 
in  terms  of  p([A  bj),  then  use  Theorem  3  to  bound  the  latter  quantity  in 
terms  of  n([A  bj);  this  could  be  done  if  the  augmented  matrix  [A  b]  satisfied 
the  PLI  condition.  It  is  not  hard  to  show  that  if  the  system  of  inequalities 
Ax  <  b  is  consistent,  then  [A  b]  satisfies  the  PLI  condition  if  and  only  if 
Ax  <  b  is  superconsistent  (that  is,  Ay  <  b  for  some  y«  H  )•  In  the  latter 
case,  it  is  simpler  to  bound  the  dual  variables  directly,  and  that  is  what  we 
shall  do  in  the  next  theorem. 

THEOREM  6:  Let  £  be  a  positive  real  number.  If  there  is  a_  y 
such  that  Ay  5  b  -  c  e ,  then  for  any  p «  Rn  and  any  primal  and  dual  optimal 
solutions  (x,  u)  of  the  program  min{p  x  |  Ax  <  b},  we  have 
Hu  ||j  <  c  1pT(y  -  x). 

PROOF:  Since  x  and  u  are  primal  and  dual  optimal  for  the  given 


program,  we  have 


— T  T 

u  A  +  p  =0, 

-T  T— 

u  b  +  p  x  =  0, 


u  >  0 


Multiplying  the  first  equation  by  y  and  subtracting  it  from  the  second,  we 
find  that  u  (b  -  Ay)  =  p  (y  -  x).  Thus  c  ||u  Hj  =  u  (ce)  <  u  (b  -  Ay)  =  p  (y  -  x), 

_  ^  T _  ___ 

or  ||u  I!  5  €  p  (y  -  x),  as  was  to  have  been  shown. 
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With  this  theorem,  we  can  now  prove  the  main  result  of  the  paper. 

In  what  follows,  subsc,  pts  indicate  membership  in  a  sequence  (not  components). 

THEOREM  7!  Suppose  the  program  min{p  x  |  Ax  <  b}  is  solvable: 

suppose  further  that  the  system  Ax  <  b  is  superconslstent.  Let  {Afi}  ,  {b^}, 

and  (pn)  be  sequences  with  An  —  A,  b^  -*  b,  and  p^  —  p.  Suppose  a 

bounded  sequence  {x^}  exists  with  the  property  that  for  each  n,  x^  solves 
T 

minfp  x  I  A  x  <  v  1.  Then  there  is  a  constant  t  such  that  for  every  n. 
n  n  =  n  -  — -  —  *■  7 

dn  J  T<  i,A  ‘  An  L  +  l'b-bnll»+  «PT  -  ■«». 

T 

where  d^  is  the  distance  from  x^  to  the  solution  set  of  min{p  x  |Ax<b}. 

PROOF:  We  shall  let  d  be  the  i  distance;  the  theorem  is  of 

n  oo 

course  true  for  distance  in  any  other  norm,  since  all  are  equivalent  to  ||  H^. 

Since  Ax  <  b  is  superconslstent,  we  can  find  a  real  number  c  >  0 

and  a  vector  yt  Rn  such  that  Ay  -  b  <  -«e.  Also,  since  A  -*  A  and 

=  n 

bn  -  b,  there  is  some  integer  N  such  that  for  all  n  >  N  we  have 
A  y  -  b  <  ee.  Let  ||x  ||  <  B  for  all  n;  then  if  {u  }  is  any  sequence 
of  dual  solutions  corresponding  to  {xn}  we  must  have  by  Theorem  6,  for  all 
n  >  N  , 

II u  II,  <  2 1  Jp^(y  -  x  )  <  2«  _1C(  ||y  ||  +  B), 

ni=  n  n  =  oo7 

T 

where  C  is  a  bound  on  ||p  ||  (the  sequence  {p  },  being  convergent. 

n  oo  '  n  37 

T 

must  be  bounded).  Now  let  x  solve  min{p  x  |  Ax  <  b},  and  apply  Theorem  5 
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with  x  =  xn*  For  n  >  N,  we  can  establish  uniform  bounds  for  the  quantities 
appearing  in  that  theorem  as  follows  (note  that  tj,  o,  (3,  and  y  all  depend 
on  n,  though  for  simplicity  we  do  not  subscript  them): 

a.  8,  t,  and  a  are  constants . 

b.  :  =  4  +  6  Hx  -  xn  || w  5  t,  +  0(  ||x  || ^  +  B),  and  |  can  be 
similarly  bounded. 

c.  or  :  =  T)  |jxn  11^  +  i  Ilun  || J  llx  11^  <  TiB  +  2|«  -1C(  ||y  11^  +  B)  ||x  II^J 

J3  and  y  can  be  bounded  in  the  same  way. 

Thus,  since  N  is  finite  we  can  find  some  constant  r  such  that  for 
every  n  we  have  max(a,  p,  y)  <  t,  and  therefore  if  x  is  as  in  Theorem  5, 

d  =  llx  -  x  ||  <  t(  ||A  -  A  ||  +  lib  -  b  ||  +  llpT  -  II  ), 

n  no°=  n  oo  n«>  n°o 

as  was  to  have  been  shown. 

We  shall  give  two  simple  examples  here  to  show  how  Theorem  7 

(but  not  Theorem  5)  can  fail  if  either  the  condition  of  superconsistency  or 

that  of  the  boundedness  of  {xn}  is  dropped.  The  first  example  is  a  sequence 

of  programs  over  R  with  the  following  parameters:  A^  =  [  ^],  b^  =  [  j/n], 

p  =  f  1  ].  These  programs  satisfy  the  boundedness  requirement  (x  =1  for 
n  n 

each  n)  but  the  limit  program  is  not  superconsistent.  It  is  easily  verified  that 

d  =  1  for  each  n,  so  the  conclusion  of  Theorem  7  does  not  hold.  Theorem  5 
n  1 
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is  still  valid,  although  in  a  useless  form  since  the  dual  variables  are  given 
by  un  =  [0  n],  and  are  thus  unbounded.  This  example  also  illustrates  the 
fact  that  it  is  not  sufficient  to  assume  in  place  of  supereonsistency  that  the 
feasible  region  {x  |  Ax  <  b)  has  an  interior. 

2 

The  second  example  consists  of  a  sequence  of  programs  over  R  , 


with 


An  =  [-1/n  -1],  bn  =  [0],  pn  =  [1/n  1], 

T 

If  we  let  =  [n  -1]  ,  we  again  have  dR  =  1  for  each  n.  As  before, 

Theorem  5  holds  but  provides  no  useful  information  since  {x^}  is  unbounded. 

T 

If,  on  the  other  hand,  we  had  taken  x  =  [1  -l/nl  ,  then  (x  )  would  have 

n  7  n 

been  bounded  and  Theorem  7  would  have  been  applicable.  It  is  perhaps 
worthwhile  to  point  out  here  that  Theorem  7  does  not  require  that  all  sequences 
of  solutions  to  the  perturbed  programs  be  bounded,  but  only  that  at  least  one 
bounded  sequence  exist. 

It  should  be  noted  that  instead  of  requiring  {xn}  to  be  bounded  in 

Theorem  7 ,  we  could  have  allowed  ||x  ||  to  appear  explicitly  in  the  bound 

for  d  ;  then,  if  one  had  some  growth  condition  on  ||x  ||  and  if  A  ,  b 
n  n  oo  rr  n 

and  p  converged  quickly  enough,  one  might  still  show  that  the  d  must 
n  n 

converge  to  zero  at  a  certain  rate.  Also,  we  could  have  incorporated  a  continuous 
perturbation  into  A,  b,  and  p,  instead  of  using  sequences  (although  then  one 
has  to  be  careful  to  maintain  superconsistency).  However,  we  have  not  thought 
it  necessary  to  carry  out  these  details  here. 
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Conversion  Factors,  British  to  Metric  Units  of  Measurement 

British  units  of  measurement  used  in  this  report  can  be  converted  to 
metric  units  as  follows: 


Multiply 


SL 


To  Obtain 


inches 

inch-pounds 

pounds 

pounds  per  square  inch 


25.4 

0.1129848 

4.448222 

6.894757 


millimeters 

meter-newtons 

newtons 

kilonewtons  per  square  meter 


The  remainder  of  this  paper  has  been  reproduced  photographically  from 
the  author's  manuscript. 
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Summary 


The  objective  of  this  study  was  to  evaluate  a  finite  element  method' 
for  predicting  the  response  of  a  cylindrical  shell  buried  in  sand  and  sub¬ 
jected  to  a  static  surface  pressure.  Test  results  for  cylinders  of  three 
thicknesses  (3/8,  l/V,  and  l/8  in.)  were  compared  with  the  response  cal¬ 
culated  using  the  finite  element  computer  code. 

Comparison  of  the  computer  program  results  with  the  experimental 
results  revealed  a  wide  discrepancy  between  the  two.  Analysis  of  the  re¬ 
sults  indicated  that  it  is  doubtful  that  any  predicting  scheme  which  uses 
a  homogeneous  soil  model  will  be  successful  since  values  of  moment  and 
thrust  which  develop  in  stiff  cylinders  are  very  sensitive  to  the  immediate 
soil  environment  of  the  cylinder  and  it  is  difficult  to  obtain  a  uniform 
backfill  in  this  region  even  in  the  laboratory. 

Based  on  the  results  reported,  it  is  not  possible  to  make  a  general 
assessment  of  the  worth  of  the  finite  element  approach  to  the  analysis  of 
buried  structures.  However,  the  findings  are  valuable  in  that  they: 

a.  Point  out  the  need  to  know  the  characteristics  of  the  soil 
backfill  in  order  to  use  th^  finite  element  analysis  to 
predict  the  response  of  buiued  cylinders, 

b.  Serve  as  a  guide  to  futurf  extension  of  the  experimental 
program  by  pointing  out  the  difficulty  of  constructing  a 
uniform  backfill  even  in  the  laboratory. 

£.  Bring  out  the  increased  significance  placed  on  back  xl 
characteristics  due  to  the  stiff  cylinders  used  in  the 
experimental  program. 

d.  Show  the  danger  of  extrapolating  design  procedures  derived 
from  flexible  cylinder  data. 
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Introduction 


Background 

1.  In  the  design  of  buried  conduits,  ways  and  methods  are  constantly 
being  sought  to  reduce  the  cost  of  construction.  To  accomplish  this  and 
still  ensure  that  the  conduits  function  properly  requires  a  means  of  pre¬ 
dicting  the  load  distribution  on  the  buried  structures. 

Objective 

2,  The  objective  of  this  study  was  to  evaluate  a  finite  element 
method  for  predicting  the  response  of  a  cylindrical  shell  buried  in  sand 
and  subjected  to  a  static  surface  pressure. 

Test  geometry 

3-  For  the  test  configuration  illustrated  in  fig.  1,  the  testing  ar¬ 
rangement  should  provide  a  plane  strain  condition  for  the  soil  and  this  is 
used  in  the  finite  element  idealization. 

4.  Fig.  2  shows  the  thickness  of  the  three  cylinders  treated  in  the 
analysis  (3/8,  l/4,  anti  l/8  in.*) .  The  detailed  diagram  of  fig.  3  indicates 
the  arrangement  of  these  test  specimens  to  produce  a  plane  stress  configura¬ 
tion  for  each  cylinder. 

Soil  model 

5-  In  formulating  any  equivalent  continuum  model  for  a  soil  medium, 
one  is  always  faced  with  the  problem  of  the  limitations  on  the  type  of 
tests  which  can  be  performed  with  reasonable  accuracy  and  must  be  aware  of 
the  geometry  associated  with  these  tests.  The  two  tests  used  in  this  ap¬ 
proach  are  (a)  the  constrained  modulus  test  shown  in  fig.  4  and  (b)  the 
triaxial  compression  test  illustrated  by  fig.  5* 

6,  The  information  contained  in  the  experimental  curve  of  fig.  4  is 
input  to  the  finite  element  computer  code  in  tabular  form  by  partitioning 
the  curve  into  five  straight-line  segments. 

7.  Curves  shown  in  fig.  5  are  used  to  determine  the  constants  in 

the  relation  _ 

*  A  table  of  factors  for  converting  British  units  of  measurement  to  metric 
units  is  presented  on  page  vii. 
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S2  = 


2d 


+  ^2 

y  a 


Cl) 


where  p  is  the  "initial"  shear  modulus,  a  is  associated  with  a  "yield" 
surface,  and  and  li>  are  the  second  invariants  of  the  deviators 

stress  and  strain,  respectively.  Using  equation  1  to  define  the  shear 
modulus 


G  = 


2|i 


1  + 


y  a 


(2) 


With  the  aid  of  equation  2  and  the  information  of  fig.  4,  an  incremental 
constitutive  relation  of  the  form 


r  b  „  2  ^ 

r 

a 

K+^-G  K  -  =■  G  0 

e 

X 

3  3 

X 

a 

2  4 

K  -  tG  K+^-G  0 

( 

e 

y 

3  3 

y 

a 

0  0  2G 

e 

l  >35 

l  xyj 

is  used  for  each  soil  element  in  the  finite  element  model.  In  this  rela¬ 
tion,  G  is  defined  by  equation  2  and  K  is  the  bulk  modulus. 


Analysis  and  Comparison  of  Results 

8.  The  approach  used  in  this  study  to  include  the  physical  nonline¬ 
arities  of  the  soil  medium  is  that  of  adding  a  succession  of  linear  incre¬ 
mental  problems.  For  the  incremental  problems,  each  element  is  assigned 
linearly  elastic  constants  based  on  the  current  states  of  stress  and  strain 
that  exist  in  that  element.  As  one  should  expect,  this  type  of  detailed 
analysis  is  costly  computerwise .  For  instance,  in  the  following  results, 

50  incremental  problems  were  used  for  each  cylinder  configuration,  requir¬ 
ing  approximately  21  minutes  of  Univac  1106  computer  time. 

9.  The  boundary  value  problem  described  in  fig.  lb  is  partitioned 


100 


into  33.4  elements  using  34?  node  points  as  shown  in  fig.  6.  The  cylinders 
are  represented  in  the  finite  element  idealization  using  two  elements  to 
span  their  wall  thicknesses. 

10.  Fig.  7  shows  comparisons  of  the  calculated  values  of  the  change 
in  vertical  diameter  with  the  experimental  values  for  the  3/8-  and  1/8-in. 
cylinders.  The  solid  line  is  the  experimental  measured  change  in  vertical 
diameter  with  respect  to  static  pressure  on  the  soil  surface.  As  can  be 
seen,  the  calculated  values  fall  consistently  below  those  measured  in  the 
experiment . 

11.  Fig.  8  shows  measured  and  calculated  moments  for  the  3/8- in. 
cylinder.  The  pattern  of  variation  for  the  other  two  cylinder  sizes  was 
similar.  The  discrepancies  in  the  graphs  are  consistent  with  the  devia¬ 
tions  between  calculated  and  measured  deflection  displayed  in  fig.  7*  Meas¬ 
ured  and  calculated  thrusts  for  all  three  cylinder  sizes  are  shown  in  figs. 
9-11.  The  experimental  thrust  calculation  involves  taking  the  difference 

of ‘two  recorded  strain  gage  measurements,  and  this  accounts  for  the  erratic 
variations  of  the  thrust  curves.  Even  though  the  oscillation  in  the  ex¬ 
perimental  thrust  curves  indicates  scatter  in  the  data,  it  is  felt  that  the 
curves  reflect  global  behavior  of  the  thrust,  and  that  the  large  deviation 
of  the  calculational  results  is  meaningful. 

12.  The  deformed  shapes  of  the  cylinders  were  calculated  by  sub¬ 
tracting  out  tne  rigid  ooay  motion  01  the  cylinders  (fig.  12).  Only  the 
vertical  diameter  change  was  measured  in  the  experiment,  and  this  data 
point  is  indicated  in  the  figure.  The  deformation  patterns  are  substan¬ 
tially  as  one  would  expect  under  the  loading  conditions . 

13.  Figs.  13-15  are  comparisons  of  the  calculated  vertical  stresses 
at  the  crown  with  the  calculated  horizontal  stresses  at  the  spring  line  in 
the  soil  elements  adjacent  to  the  cylinders.  These  graphs  indicate  the 
relative  stiffness  of  the  cylinder  to  that  of  the  soil:  the  3/8-in.  cylin¬ 
der  giving  a  at  the  crown  greater  than  a  ,  at  the  spring  line;  the 

yy  xx 

l/4-in.  cylinder  showing  the  two  stress  components  to  be  relatively  close 

together;  the  1/8-in.  cylinder  having  the  o  at  the  crown  less  than  a 

'  yy  xx 

at  the  spring  line.  In  addition,  these  curves  are  indicative  of  the  lateral 
load  developed  on  the  cylinder  in  the  finite  element  analysis  of  the 
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problem.  The  previous  discussion  concerning  the  experimental  thrust  curves 
shows  that  this  lateral  load  did  not  develop  in  the  experiments ,  especially 
on  the  stiffer  cylinders. 

14.  In  fig.  1 6,  the  solid  lines  give  the  stress  components  measured 
in  the  soil  in  a  typical  test  as  a  function  of  the  surface  pressure.  The 
points  on  these  graphs  represent  the  calculated  stress  components  at  the 
same  location.  The  agreement  between  the  calculated  and  measured  values 
is  an  indication  that  the  equivalent  continuum  model  is  representative  of 
the  soil  insofar  as  the  free-field  stress  components  are  concerned. 

15.  The  calculated  deflections  of  the  soil  surface  for  the  three 
cylinder  configurations  are  shown  in  fig.  17.  These  curves  are  also  indic¬ 
ative  of  the  relative  stiffness  of  the  cylinders  as  compared  to  the  soil 
model . 

lo.  Figs.  18  and  19  display  the  stress  components  for  the  entire 
soil  field  for  the  3/8- in.  cylinder  with  an  applied  surface  pressure  of 
500  psi.  The  measured  values  are  shown  in  parentheses  at  the  location  of 
the  measuring  gage.  These  are  the  peak  values  taken  from  the  curves  such 
as  that  shown  in  fig.  16. 

17.  The  discrepancies  between  calculated  values  and  experimental 
curves  indicates  a  pattern  that  suggests  that  the  poor  agreement  between 
experimental  and  calculated  values  results  from  the  use "of  a  single  soil 
model  for  the  oanu  medium  rather  than  from  une  type  of  soil  model  used  in 
the  calculations .  The  placement  of  the  cylinders  and  the  subsequent  burial 
procedures  apparently  produce  a  loose  zone  of  sand  just  below  the  spring 
line  and  adjacent  to  the  cylinders.  This  explanation  of  the  experimental 
results  is  discussed  in  more  detail  in  the  next  section. 

Conclusions  and  Recommendations 

18.  Although  the  geometry  involved  in  this  study  is  not  complex,  the 
nature  of  the  problem  was  such  that  it  presented  a  crucial  test  for  a  soil- 
structure  interaction  code.  The  stiffness  of  the  cylinders  involved  varied 
sufficiently  to  cover  the  cases  of  active  arching  (the  structure  carries 
less  load  proportionally  than  the  surrounding  soil)  to  passive  arching  (the 
structure  carries  more  load  proportionally  than  the  surrounding  soil) . 
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Thus,  the  loading  conditions  of  the  soil  in  the  vicinity  of  the  cylinders 
differed  significantly  from  the  3/S- in.  cylinder  to  the  l/3-in.  cylinder. 

19.  Comparison  of  the  computer  program  results  with  the  experimental 
results  revealed  a  wide  discrepancy  between  the  two.  There  are  several  in¬ 
dicators  that  the  environment  of  the  cylinders  was  different  in  the  model 
from  that  of  the  tests.  For  instance,  the  recorded  free-field  components 
of  stress  are  in  reasonable  agreement  with  the  calculated  values,  but  the 
measured  values  of  thrust  at  the  crown  of  the  3/3-in.  cylinder  indicate 
that  it  received  essentially  no  lateral  support  from  the  soil  at  these 
pressure  levels.  Cn  the  other  hand,  the  measured  values  of  thrust  at  the 
crown  of  the  1/8-in.  cylinder  indicate  that  it  received  lateral  support 
from  the  soil.  Fig.  12  indicates  the  different  magnitudes  of  displacement 
experienced  by  the  soil  in  the  region  of  the  spring  line  of  the  cylinders 
This  suggests  that  a  reduced  density  of  the  sand  adjacent  to  the  spring  line 
due  to  placement  techniques  would  cause  the  cylinders  to  support  a  surface 
load  in  quite  a  different  manner.  The  stiff  cylinder  would  deflect  very 
little  at  the  spring  line,  and  thus  would  derive  very  little  support  from 
the  surrounding  soil  due  to  the  reduced  density  zone.  However,  the  rela¬ 
tively  flexible  cylinder  would  experience  a  deflection  at  the  spring  line 
that  would  mask  the  reduced  density  zone.  Of  course,  if  the  compression 
mode  of  the  cylinder  is  not  mobilized,  the  load  must  be -carried  as  moment. 
Thus,  the  moment  would  be  greater  than  evpenterl  in  ennv,  nnpoo.  This  is 
apparently  the  situation  that  developed  in  the  experiments  considered  in 
this  analysis . 

20.  It  is  doubtful  if  any  predicting  scheme  which  uses  a  homogeneous 
soil  model  will  be  successful.  The  values  of  the  moment  and  thrust  which 
develop  in  stiff  cylinders  are  very  sensitive  to  the  immediate  soil  environ¬ 
ment  of  the  cylinder.  This  is  due  primarily  to  small  deformation  of  the 
cylinder-soil  interface.  Thus,  irregularities  which  are  masked  by  the 
larger  deflections  associated  with  the  flexible  cylinders  become  extremely 
important  for  stiff  cylinders.  Therefore,  the  analysis  of  stiff  cylinders 
is  compounded  by  the  fact  that  a  detailed  knowledge  of  the  interface  zone 

is  needed,  and,  yet,  this  is  the  region  about  which  the  least  is  known. 
Meaningful  measurements  in  this  region  are  difficult  to  obtain. 
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21.  One  method  of  attack  on  such  problems  would  be  to  consider  a 
region  below  the  spring  line  and  adjacent  to  the  cylinder  (the  region  where 
the  sand  placement  is  most  difficult)  as  a  material  with  properties  dif¬ 
ferent  from  the  other  sand.  By  making  computer  calculations  vising  such  a 
heterogeneous  soil  model,  the  influence  of  a  reduced  density  zone  could  be 
varied  in  subsequent  computer  calculations  in  an  attempt  to  obtain  the  ex¬ 
perimental  values  for  moment  and  thrust. 

22.  If  it  is  possible  to  determine  a  heterogeneous  soil  model  which 
predicts  the  high  moment  and  low  thrust  response  as  measured  in  these  ex¬ 
periments,  the  same  scheme  can  be  examined  for  buried  structures  of  other 
shapes.  By  extending  the  investigation  to  cylinders  of  larger  diameter, 
perhaps  the  normal  component  of  stress  can  be  measured  around  the  cylinder, 
giving  an  experimental  indication  of  the  value  of  the  lateral  load  on  the 
buried  cylinders.  Therefore,  it  is  recommended  that  future  effort  be  di¬ 
rected  along  this  approach. 

23.  Even  though  the  calculational  results  differed  significantly 
from  the  experimental  curves,  these  findings  are  useful  in  that  they: 

a.  Point  out  the  need  to  know  the  characteristics  of  the  soil 
backfill  in  order  to  use  the  finite  element  analysis  to  pre¬ 
dict  the  response  of  buried  cylinders. 

b.  Serve  as  a  guide  to  future  extension  of  the  experimental 
program  by  pointing  out  the  difficulty  of,  constructing  a 
uniform  backfill  even  in  the  laboratory. 

c.  Bring  out  the  increased  significance  placed  on  backfill 
characteristics  due  to  the  stiff  cylinders  used  in  the 
experimental  program. 

d.  Show  the  danger  of  extrapolating  design  procedures  derived 
from  flexible  cylinder  data. 

24.  However,  it  is  not  possible  to  make  a  general  assessment  concern¬ 
ing  the  'worth  of  this  finite  element  approach  to  analyze  buried  structures 
from  this  effort  due  to  the  unknown  density  distribution  around  the  cylin¬ 
ders.  Certainly  the  calculational  results  cannot  be  expected  to  reflect 
characteristics  which  are  not  built  into  the  soil  model.  In  this  case, 
with  the  stiff  cylinders  and  relatively  low  surface  pressure,  it  is  appar¬ 
ently  the  conditions  at  the  soil-cylinder  interface  that  dictated  the  mode 
of  response  of  the  cylinder. 


Fig.  2.  Comparison  of  cylinder  thicknesses 


a.  EXPLODED  VIEW 


Fig.  3.  Exploded  view  and  geometry  of  test  structure 
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Fig.  11.  Variation  of  thrust  with  applied  surface 
pressure  for  l/8-in.  cylinder 
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APPLICATION  OF  SPLINE  INTERPOLATJ ON 
METHODS  TO  ENGINEERING  PROBLEMS 

J.  B.  Cheek,  Jr.,  N'.  Kadhakr ishnan,  !  .  T.  1  racy 
Computer  An. .lysis  Branch 
Automatic  Data  Process I ng  Division 
US  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi 

FOREWORD .  This  paper  was  prepared  by  Mr.  J.  B.  Cheek,  Jr.,  Dr. 
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SUMMARY 


This  pater  was  prepared  to  familiarise  practicing  scientists  and  en¬ 
gineers  with  the  cubic  spline  interpolation  technique  as  a  possible  tool  in 
curve  fitting  for  computer  programs  for  which  more  commonly  used  techniques 
may  be  unsuitable  or  of  limited  value.  The  spline  technique  is  compared 
with  more  common  methods,  specifically  piece-wise  linear  and  polynomial, 
and  examples  of  applications  of  the  technique  to  engineering  problems  are 
presented.  Appendix  A  contains  the  ir.ath.emat ical  derivation  of  the  equa¬ 
tions  defining  the  spline  function,  and  Appendix  B  contains  a  compact 
FORTRAN  fitting  and  interpolating  program. 

The  interpolating  spins?  curve- fitting  technique  has  three  primary 
advantages : 

a.  The  spline  passes  throw-. h  all  data  points. 

b.  The  first  and  second  d»'ci v-s-t I ves  of  the  spline  are  continuous 
at  all  points, 

c.  The  spline  can  be  easily,  modified  to  satisfy  new  or  addi¬ 
tional  data. 

The  experience  of  the  Waterways  Experiment  Station  (WES)  in  applying  spline 
techniques  to  engineering  problems  has  indicated  that  these  advantages  o.n- 
weigh  the  additional  storage  and/or  computation  time  requirements  of  the 
technique  in  many  applications. 

Since  the  spline  function  is  required  to  pass  through  all  data 
points,  erratic  derivative  behavior  nny  result  from  experimental  error 
when  the  data  points  are  numerous  at1!  closely  spaced.  Trial  and  error 
methods  for  smoothing  such  functions  exist,  but  they  arc  time  const  i 
WES  experience  has  indicated  th.au  acceptable  results  can  generally  be 
obtained  by  simply  selecting  a  more  limited,  more  widely  spaced  set  of  the 
data  points  to  which  to  fit  the  curve. 
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PART  I 


i'urj^os'j  an  1  A  opr  j.i  j'r. 


prob] 

cubic 


1.  Many  computer  programs  applifd  to  engineering  r 
ems  must  me  lei  the  characteristics  '  f  nonlinear  phys 
spline  offers  outstanding  advantages  over  the  inter 


j  ‘  'l  .1  oYi* 

p  ,i  atiit;. 


commonly  used,  in  this  class  of  problems.  It  is,  therefore,  the  purpose  of 
this  paper  to  point  out  the  shortcomings  of  sever:.]  methods  and  show  hov, 
spline  techniques  may  be  useo  to  advantage.  This  purpose  is  approa  ’bed 
through  discussion  and  examples  in  language  and  subject  that  are  meaningful 
to  the  research  and  design  engineer  in  order  to  bring  to  the  practicing 
scientist  and  engineer  an  assurance  that  the-  cubic  spline  formulation 
offers  a  powerful,  practical  modeling  a:  1  J  ntf-rp-.d  nting  method  for  use  in 


his  computer  codes. 


Cotnmonl ;/  Use!  Cur.  e-Kit ti  nr  Techni  cues 


2.  Numerical  techniques  and  digital  computers  are  be *  ng  appii 
an  increasing  number  of  civil  engineer!  nr  probl  emr .  I  his  is  prinvi: 
due  to  the  flexibility  afforded  by  the  r.umt-ri  :a:  ir ,  m  r. ,  in  ‘hot 
design  and  research  engineer  can  easily  si.«.~ify  .-  i  i  .-at  i  im  under.- 
ditions  and  use  nonlinear  material  p--o;  ervi  ej .  The  finite  li  1  f &'*c:  j 
finite  element  methods  are  excellent  examples  of  fids  growing  upp’ic 
area.  The  valid  description  (modeling'  of  the  nonlinear  properties 


■-  J  : 


at:  on 
to  the 


computer  program  is  a  primary  consideration  and  is 


the  major  cot.cern  of 


this  paper. 

3.  Prob lens  of  th 
tween  physical  quaniitic 
functional  or  analytical 
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given  a  strain,  what  is  the  stress;  or  given  a  water  elevation  (stage) , 
whs  ‘  in  i'r."  flow  { d  i  .-.charge) .  Such  relationships  are  normal  l.y  available 
only  an  iota  points.  It  is  therefore  necessary  to  use  some  k  ini  of  euivc- 
fit’.ir.  •  •-•ehniaue  in  the  computer  program  tn  rcpresen*  the  physical  nn.rs t , 


not  a*  the  lata  points,  hut  in  the  in*  ervals  hehv-'-n  da*  a  vcii'-ts. 

4.  There  are  many  curve-fitting  methods,  hut  t neve  is  no  ideal  :v  * 
Cor.seiu'T.J  ly ,  a  s:a,;cr  lift! salty  in  devel  oping  the  solution  process  in  in 
selecting  one  curve-fitting  technique  that  is  best  suited  to  the  pro;  t  ern  at 
han  1 . 

5.  Two  of  tlie  most  commonly  used  curve- fi tting  techniques  are  piece- 
wise  linear  and  polynomial  methods,  while  hyperbolic  function  and  other 
special  purpose  representations  are  occasionally  used.  There  are, . however, 
serious  disadvantages  in  using  the  linear  and  polynomial  methods.  These 
limitations  are  presented  in  the  following  paragraphs  to  help  the  reader 
appreciate  the  advantages  of  the  spline  method. 

Piece-wise  linear  in* -Tpol  at.ior. 

6.  Disadvantages .  Given  a  series  of  data  points  that,  for  engineer¬ 
ing  purposes,  "exaci 3y"  represent  a  physical  situation,  there  is  strong 
motivation  to  use  picce-wls  •  linear  irderpol  alien  between  point  pairs. 

This  is  especially  tempting  when  additional  data  points  are  easy  to  acquire 
because  the  nonlinearity  can  be  modeled  (to  the  extent  required)  sir.r  1  y  by 
srccifvinr  additional  noint  s  f  r  t.he  nonlinear  f  "nicy"'  Hon  :  ' : 

appears  that  the  only  disadvantage  is  the  additional  computer  memory  re¬ 
quired  for  the  points. 

7-  The  serious  objection  of  discontinuous  derivatives  develops  when 
modeling  observed  physical  i  havior  and  cal  uilv.  ing  rates  of  change  'c  •■Ira 
tires )  from  the  piece-wise  linear  model.  This  difficulty  caused  b;,  discon¬ 
tinuous  derivatives  is  illustrated  in  the  following  example. 

8.  The  nonlinear  properties  of  soil  in  a  finite  element  method  (?SM) 
solution  may  be  represented  with  a  set  of  stress-strain  points,  as  shown 


in  fist.  1.  The  solut  ion  t  r 


in  to  reel  a  tier.  of  a  str 


value 


for  any  strain  value.  A  Is.  re  ruined  is  the  r.odul 


21  2'’}  1 


■.icx*.v  :cr  moc 


sane  strain  values  (the  modulus  '-'Leg  dY/dX  ,  the  slop-'  of  the  sir 
strain  curve).  Fig.  1  also  ir.  dudes  thc-  olo‘  of  modulus  versus  stra 
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fiot.e  the  abrupt  change  in  modulus  as  strain  pro- 
presses  from  region  A  to  region  B.  How  does  v  ■ 

p  : 

this  affect  the  FEM  solution  process?  ^  . 

I  .  A  5 

9.  To  answer  that  question  one  must  first  ’’  - 

<  ' 

state  three  characteristics  of  a  typical  non-  ...  . 

linear  FEM  process:  (a)  the  solution  procedure  m 

1j  ’ 

involves  solving  a  series  of  incremental  load-  g  A  8 

ing  problems,  (b)  during  the  load  cycle,  the  r 

modulus  of  each  soil  element  is  dependent  on  a/.a l  strain 

the  strain  at  that  element,  and  (c)  the  strains  ^  Fie^e-'-'i 

increase  incrementally -from  an  initial  value  as  linear  stress-strain 

,  approximation 

additional  load  cycles  are  taken. 

10.  Thus,  those  elements  having  any  value  of  strain  in  region  A  will 
have  one  modulus  value,  while  those  having  any  strain  value  in  region  B 
exhibit  a  different  modulus.  Consequently,  as  the  element  strain  progress 
from  region  A  to  B,  the  solution  process  sees  an  abrupt  change  in  ‘hat  ele 
ment's  modulus.  Such  behavior  is  not  characteristic  of  soil  modulus  versa 
strain;  i.e.,  the  model  for  modulus  versus  strain  is  obviously  invalid. 

The  effect  of  this  sirup’  change  is  .instability  in  the  numerical  ;  re-.-es:: 
and  questionable  results. 

11.  Summary .  Linear  interpolation  is  an  easy  method  m  irplem- r>.  L 

the  It  can  uceurcu.e.iy  model  thtie  ooservod  t.o.'iuv.'.’r,  provid  :-i  t 

sufficient  number  of  data  points  are  available;  but  it,  fails  cn>:. .  1  ••  *  r ;  in 
modeling  derivatives  and  is  wasteful  of  computer  memory  .••hen  st  rii.-v r. 0 
limits  are  placed  on  allowable  errors. 

Polynomial  interpolation 

12.  Disadvantages .  Because  linear  interrelation  has  discontinuous 


first  derivatives,  researchers  have  turned,  to  higher  (!;'“)  degre 


-?e  no ;  "  - 


nomials  which  have  the  much  desired  continuity  of  their  first  through 
(N  -  l)th  derivatives.  In  so  doing  they  discovered  difficulties  and 
ficiencies  associated  vj th  rolyr.-mtal  inter cola*  ion,  some  o*f  wh i'h  -  :• 


lined  below: 


Pol ynonia]  s  cf  degree  do  ro  t  al  wa; 
data  point  (if  thv ro  are  r. : r ■■  than  h 
This  is  oldest ixnarle  wh  :.  'he  Vr.?~rv 
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that  data  points  can  he  considered  "exact"  (i.e.,  data 
points  whose  probable  error  of  position  is  small  and  the 
polynomial  fit  does  not  meet  this  error  orit.ion)  . 

b.  Oscillations,  as  many  as  (N  -  l)/2  ,  may  occur  between 
the  first  and  last  data  points.  Such  oscillations  may  lie 
outside  the  range  of  the  known  behavior  of  the  system  being 
modeled  or  may  incorrectly  model  derivative  behavior. 

'  c_.  It  is  extremely  difficult  to  predict  the  overall  behavior  of 

an  1,'th  degree  polynomial  when  a  data  point  is  added,  de¬ 
leted,  or  "adjusted"  (again  assuming  there  are  more  than  N 
data  points).  This  difficulty  transforms  the  curve-fitting 
process  from  a  science  to  an  art. 

d.  Although  derivatives  of  polynomials  are  continuous,  one  can 
not  assume  that  they  are  representative  of  the  physical 
situation  being  modeled. 

1 j .  Summary.  Polynomial  interpolation  methods  can  be  successfully 
applied  to  many  nonlinear  problems,  but  a  high  level  of  education,  skill, 
and  experience  is  required  of  the  curve-fitting  practitioner  (the  equation 
maker).  Secondly,  polynomials  can  never  be  completely  trusted.  New  sets 
of  data,  even  though  they  are  from  the  same  application,  must  be  validated 
(plot*,  ed  and  examined),  and  most  likely  adjusted  through  a  trial  and  error 
process,  before  the  new  polynomial  fit  can  be  used. 


Verification  of  Interpolation  Systems 

14.  It  is  important  to  note  that  derivative  faults  are  easily  over¬ 
looked,  especially  when  the  user  restricts  his  evaluation  of  the  interpola¬ 
tion  procedure  to  testing  its  ability  to  reproduce  a  physical  effect  within 
specified  error  bounds.  As  we  have  demonstrated,  one  can  have  a  perfectly 
satisfactory  model  of  the  observed  behavior,  while  that  same  model  is  in¬ 
valid  in  other  important  physical  characteristics.  It  is,  therefore, 
highly  desirable  to  examine  all  characteristics  of  the  physical  system 
which  the  interpolating  model  is  expected  to  reproduce. 
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PART  II:  CUBIC  SPLINE  INTERPOLATION 

15.  The  aforementioned  reasons  provide  motivation  to  look  for  a 
better,  easy-to-use  interpolating  technique  that  will  give  smooth,  predict¬ 
able  behavior  and  have  continuous  first  derivatives.  U.  S.  Army  Engineer 

Waterways  Experiment  Station  ( WES )  activity  in  several  engineering  areas 

2  3 

indicates  that  the  cubic  spline  (hereinafter  referred  to  as  spline)  has 
many  advantages  as  an  interpol ating  method.  Several  important  spline  char¬ 
acteristics  are  outlined  below: 

a.  The  spline  passes  through  the  data  points. 

b.  The  spline  is  a  piece-wise  cubic.  The  data  points  mark  the 
points  of  transition  from  one  set  of  coefficients  to  the 
next. 

£.  The  first  and  second  derivatives  of  the  spline  are  continu¬ 
ous  at  all  points. 

d.  The  spline  curve  looks  similar  to  that  drawn  by  a  french 
curve  or  a  mechanical  spline  (more  on  mechanical  or  physical 
splines  in  paragraphs  20  and  33). 

e.  Adjustment  of  any  data  point  affects  the  entire  curve,  but 
the  effect  is  predictable. 

f.  The  spline  is  uniquely  defined  by  the  X  and  Y  coordinates 

of  each  data  point  and  either  the  first  or  second  derivative 
at  each  data  point. 

Mathematical  Formulation 

16.  The  mathematical  spline  function  is  a  piece-wise  third  degree 
(cubic)  polynomial  passing  through  all  data  points  and  having  continuous 
first  and  second  derivatives. 

17.  The  spline  can  be  viewed  as  a  set  of  cubic  equations,  one  equa¬ 
tion  for  each  interval  between  successive  data  points.  The  coefficients 
of  the  cubic  equations  are  such  that,  at  any  data  point,  the  equation  for 
the  left  interval  will  yield  the  same  values  for  the  first  and  second  de¬ 
rivatives,  respectively,  as  will  the  equation  for  the  right  interval. 

18.  Given  a  set  of  N  data  points  for  which  the  coordinates  (X^,Y^) 
and  second  derivatives  (lh)  are  known  for  every  point  (i  =  1,  2,  3...N), 
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then  the  interpolating  spline  function  S(x)  is  defined  as 

,  (X.  .  -  X)3  ,  (X  -  X.)3  r 

3(x)  =  i  M.  -y--1 - n —  +  \  M.  - - -  +  Fy,  -  \  M.(X 

v  '  6  1  X,.,  -  X.  6  l+l  X.  . ,  -  X^  [_  l  6  iv  l+l 


i+1 


i+1 


-  x/]  xpfrir  +  [Vi  -  5  Vi<Vi  -  xi>2]  (!) 


x  -  X. 


where  i  is  such  that  X.  <  x  <  X.  ...  .  The  first  and  second  derivatives 

i  -  -  x+l 


are 


S'(x) 


Mi  (X, 


i+1 


X) 


,2 


M. 


i+1 


2  Xi+1  -  Xi 


(*  -  Xi) 

2  Xi+1  -  Xi 

+  + 1  (Mi  -  Mi+-i)(xi+-i  -  V  (2> 

i+i  i 


s"<*>  -  Msrrrr1** 

l+l  l, 


x  -  Xi 

1+1  W+i  ■  xi 


(3) 


19.  The  method  of  evaluating  the  M.  coefficients  is  presented  in 
Appc  ■■•dix  A,  along  with  a  derivation  of  the  above  expressions.  A  compact 
fitting  and  interpolating  FORTRAN  progiam  is  included  in  Appendix  B. 


yioCuooi-Ui  Ul  OpllllC  Oc* r  1 S  L 1  c s 


20.  The  spline  properties  discussed  in  paragraph  15  illuminate  the 
advantages  and  disadvantages  of  splines.  Properties  a,  b,  c,  and  d  combine 
to  produce  an  accurate  curve  having  continuous  first  and  second  derivatives 

4 

This,  as  mentioned  earlier,  is  very  useful  in  engineering  analysis  problems 
Properties  d  and  e_  provide  a  physically  based  insight  for  adjusting  data  to 
obtain  a  better  curve.  In  fact,  the  mathematical  .  \bic  spline  can  be  de¬ 
rived  from  the  theory  associated  with  the  deflected,  shape  of  a  weightless 
elastic  team  constrained  at  particular  points.  • 

21.  Greville""  noted  that  many  physical  systems  are  correctly  modeled 
by  cubic  or  quadratic  equations.  This  fact  accounts,  to  a  large  extent, 
for  the  popularity  of  cubic  splines.  It  also  enables  us  to  obtain  a 
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satisfactory  fit  with  a  small  number  of  data  points  when  an  interval  of  the 
physical  phenomenon  exhibits  first,  second,  or  third  degree  behavior. 

Spline  versus  linear  interpolation 

22.  Fig.  2  is  a  spline  fit  to  the  same  set  of  data  used  in  fig.  1. 
Note  the  smoothness  of  the  curve  between  data  r  , 

.  |  SPLINE  / 

points.  More  imnortant  is  the  continuous  derive-  / 

F  .  / 

tive  behavior.  This  is  a  marked  improvement  over  to  : 
the  linear  interpolation  model  shown  in  fig.  1.  ^ 


Spline  versus  polynomial  interpolation  — - — — - — *-*-*—-  ‘ ' 

23.  An  advantage  of  splines  over  polyno-  .  spuije 

mials  is  illustrated  in  fig.  3  which  shows  a  set  ^ i  / 

r  8  t  x  / . 

of  6  data  points  to  which  a  fifth  degree  polynom-  5  ^  / 

t  \J 

ial  and  a  spline  have  been  fit.  This  example  ~  - - — 

„  „  ‘  AXIAL  STRAIN 

shows  the  influence  of  a  bad  data  value  on  both 

the  spline  and  polynomial.  Note  how  the  spline  Fig.  2.  .Spline  ap¬ 

proximation  of  stress- 

tends  to  minimize  the  influence  of  the  bad  point  strain  relation  shown 
P.  The  polynomial  on  the  other  hand  is  com-  f’-'-K*  1 

pletely  upset  by  the  bad  point. 

2U . •  This  rather  extreme  example  is  intro- 
.1  VsT“  cr&PEE  polynomial  duced  not  as  a  practical  consideration  but 

■1  \ 

S'  '  rather  to  illustrate  the  oscillating  character 

>  - - - \  1 

|  I  \  j‘  of  polynomials  that  is  so  troublesome  in  curve 

f  '•sa.iME  y  '  .  . 

y  __  _ .  fitting.  Discussion  of  least  square  polynomials 

i  5«  KG(;EE  fclyncmiaj  is  Provided  in  the  apol  i  cat  ions  section,  para- 

k  ’  graphs  hO -hi. 


axial  strain 

Fig.  2.  Spline  ap¬ 
proximation  of  stress- 
strain  relation  shown 
fig.  1 


'  x' 
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VS 
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Disadvantages  of  snline 


"  *"  --  2l;  .  Properties  b  and  f  (paragraph  15 )  are 

X 

sometimes  interpreted  as  disadvantages.  The 

Fig.  3-  General  comoari- 

son  of  spline  and  poly-  spline  ls  not  defined  as  a  single  expression 

noraial  fits  over  the  entire  range  of  the  data  points.  There¬ 

fore,  to  evaluate  S(x)  one  must  first  find  the  two  adjacent  data  points 
bounding  x  and  then  apply  the  equations  of  paragraph  13.  This  results 
in  longer  computer  time  than  that  required  in  polynomial  fits.  However, 
seai’ch  time  can  be  minimized  by  use  of  efficient  alrorithms. 
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26.  The  number  of  coefficients  needed  to  define  the  spline  (three 
times  the  number  of  data  points)  is  too  large  for  some  engineering  appli¬ 
cations.  This  is  due  to  the  fact  that  the  computer  codes  themselves  (such 
as  the  finite  element  and  finite  difference  codes)  require  vast  amounts  of 
storage  to  solve  physical  problems.  For  instance,  the  finite  element  anal- 

t  ysis  of  soil-structure  interaction  problems  often  requires  that  several 
soil  zones  be  considered.  If  the  nonlinear  stress-strain  behavior  of  the 
soils  is  modeled  by  splines,  then  for  each  different  layer  of  soil  one  must 
store  parameters  for  a  different  spline  function.  Thus,  the  number  of 
storage  locations  required  by  splines  will  be  greatly  increased  in  such 
problems.  The  same  is  true  for  finite  difference  solutions  of  ground  shock 
problems.  The  engineer,  quite  naturally,  is  inclined  to  use  the  available 
storage  for  larger  physical  problems  rather  than  to  introduce  what  he  may 
feel  is  an  unnecessary  elegance  in  the  stress-strain  interpolation  routine. 
Summary 

27.  Although  WES  experience  with  splines  is  limited,  it  has  led  to 
the  conclusion  that  the  advantages  of  the  spline  (continuous  derivatives, 
smooth  curve,  dependability,  and  physical  insight)  far  outweigh  the  prob¬ 
lems  of  increased  computer  memory  and  somewhat  longer  computation  times. 
Further,  through  the  skill  of  the  numerical  analyst  and  programmer,  one 
can  exercise  considerable  influence  in  adapting  the  formulation  and  code 

trC  mliiAmlZc  l  Ul'i  time  Oi  o  lui  a^c }  a  puiuo  ui^cuo^cu  J-ii  uiic  <xy pix u ct 0  XUIit* 

section. 
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PART  III:  APPLICATION  OF  SPLINES  TO  ENGINEERING  PROBLEMS 


28.  This  section  describes  the  application  of  cubic  splines  to  sev¬ 
eral  civil  engineering  problems  at  WES.  Applications  of  the  cubic  spline 
fitting  and  interpolating  techniques  to  hydraulics  problems,  soil-structure 
interaction  problems,  and  seepage  problems  are  presented. 

Hydraulics  Problems 

Rating  curves 

29*  This  application  in  the  field  of  hydraulic  engineering  deals 
with  modeling  rating  curves  in  a  flood  routing  program  authored  by  Mr.  E.  A. 
Graves  of  the  Lower  Mississippi  Valley  Division.  Mr.  Graves  and  Mr.  J.  B. 
Cheek,  Jr.,  of  WES  are  currently  involved  in  the  application  of  spline 
methods  to  this  problem. 

30.  Rating  curves  are  used  to  describe,  at  stated  points  (stations) 
on  a  river,  the  flow  (discharge)  characteristics  as  a  function  of  the 
height  of  the  water  in  the  river  (stage).  The  flood  routing  program  pro¬ 
duces  very  satisfactory  results,  but.  a  great  deal  of  difficulty  is  export  - 
enced  in  obtaining  polynomial  fits  to  the  rating  curves  that  are  required 
as  input  to  the  program.  The  following  figures  and  discussion  illustrate 
the  improvement  brought  by  the  spline  interpolation  method  to  the  rating 
curve  model. 

31 •  Spline  fit  to  20  points. 

Fig.  4  shows  20  data  points  of  a  rat¬ 
ing  curve  with  a  spline  fit  through 
those  . points .  Except  for  the  large 
number  of  points,  the  spline  repre¬ 
sentation  was  considered  satisfac¬ 
tory  until  the  derivative  shown  in 
fig.  5  was  examined.  It  was  charac¬ 
terized  by  an  objectionable  lack  of 
smoothness . 

32 .  Reducing  number  of  points 
to  improve  derivative .  In  order  to 


Fig.  4.  Spline  fit  of  20-point 
rating  curve 
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DISCHARGE 

Fig.  Derivative  for  20-point 
spline  fit 


reduce  the  number  of  data  points  and 
produce  a  smoother  derivative  curve, 
6  of  20  original  points  were  chosen 
and  a  spline  was  fitted  to  them. 

This  is  illustrated  in  fig.  6  along 
with  the  original  20-point  curve. 

The  6-point  curve  does  not  match  the’ 
original  to  the  desired  degree.  It 
seems  desirable  to  add  a  few  more 
points.  In  a  situation  such  as  this 


the  physical  insight  to  the  spline 
has  its  impact. 

33-  As  mentioned  previously, 
the  mathematical  spline  formulation 
also  describes  the  deflection  of  a 
weightless  linear  elastic  beam  (a  phys¬ 
ical  spline)  which  is  simply  con¬ 
strained  at  the  data  points.  This 
means  that  we  can  visualize  the  mathe¬ 
matical  spline  fit  as  that  shape  as¬ 
sumed  by  a  thin  strip  of  steel  which 
is  cnrstrai  nnri  nn  the  paper  by  a 
straight  pin  at  each  data  point.  With 


Fig.  6.  Comparison  of  6-  and  20- 
poi  nt  spl  i  ne  f i  t,  a 


Fig.  7.  Comparison  of  7-  and  20- 
point  spline  fits 


this  concept  in  mind,  the  effect  of 
adding  point  P  (fig.  6)  to  the  6- 
point  data  set  can  be  easily  pre¬ 
dicted.  The  new  point  P  would  bend 
the  spring  closer  to  the  original 
curve  along  A  and  would  also  cause 
a  downward  movement  along  B . 

3h.  Fig.  7,  which  shows  the 
7-point  spline  fit  along  with  the 
original  20- point  fit,  shows  the  ef¬ 
fect  of  adding  point  P.  TIote  how 
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nicely  the  7- point  fit  follows  the  lower  portion  of  the  curve.  By  some  fur 
ther  adjustments,  the  upper  portion  of  the  curve  could  be  improved,  but  it 
is  not  necessary  for  the  present  purpose  since  the  point  has  been  made, 
i.e.,  insight  into  the  physical  character  of  the  mathematical  spline  makes 
the  curve-fitting  process  easy. 

35.  The  derivative  for  the  7-point  curve  (fig.  8)  shows  much  im¬ 
provement  .in  smoothness.  This  improvement  is  presumed  to  be  due  to  the  re¬ 
duced  number  of  points.  This  aspect 
of  spline  interpolation  will  be  dis¬ 
cussed  further  in  paragraphs  4  3-^5 • 

36 .  Modification  of  spline  fit. 

Because  rating  (or  conveyance)  curves 
for  an  alluvial  river  change  with  time, 
it  is  sometimes  necessary  to  change 
them.  With  splines,  this  can  be  done 

by  substituting  new  data  points,  with  the  assurance  that  the  new  fit  will 
be  satisfactory.  This  contrasts  with  the  use  of  polynomials,  which  re¬ 
quire  many  more  data  points  and  which  must  be  carefully  studied  to  deter¬ 
mine  if  a  satisfactory  fit  has  been  obtained.  Also,  in  the-  appl  i  cation 
discussed,  the  first  derivative  is  required,  and  this  can  be  obtained  di¬ 
rectly  during  the  spline  interpolation  procedure.  Even  in  appl i cations 


DISCHARGE 

Fig.  8.  Derivative  for  7-point 
spline  fit 
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be  used,  it  is  desirable  to  obtain  the  first  derivatives  for  use  in  study¬ 
ing  the  suitability  of  the  curve  to  the  purpose  in  hand. 

Model  storage  curves 

37’  The  success  with  modeling  rating  curves  soon  led  to  expe-rime n- 

4 

tation  'with  modeling  storage  curves .  This  flood  routing  application  has 
severe  computer  memory  restrictions  on  it,  so  extra  steps  were  taken  to 
reduce  the  coef f ici enf.s  stored.  This  was  accomplished  j  retaining  :niy 
the  coordinates  of  the  points  and  beam  moment  of  each  data  point.  This  is 
not  normally  done  because  the  two  other  required  coefficients  for  each 
point,  developed  during  the  spline  fit  process,  must  be  thrown  out  and 
recalculated  for  the  two  branding  points  during  the  interpolation  -  m.puta- 
tion.  However,  by  so  doing,  it  is  possible  to  use  the  st  line  ir.  a 
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smoothing  parameter.  f’rom  a  practical  standpoint,  it  is  generally  prefer¬ 
able  a.  d  less  time  consuming .to  select  a  few  data  points  at  the  outset 
rather  than  to  make  repeated  runs  in  search  of  a  good  parameter  for  each 
new  set  of  data.  (This  is  not  to  imply  that  one  method  is  superior  to 
the  other,  rather  that  interpolating  splines  can  economically  be  made  to 
function  satisfactorily  for  many  scientific-enginebring  applications . ) 

Mi.  Pig.  11  slows  spline  and  polynomial  fits  to  16  of  the  original 

89  points  of  fig.  9-  Note  that  the 
points  shown  are  not  the  result  of 
several  trials  to  obtain  the  best 
set.  They  were  simply  selected  so 
that  the  x  values  were  about  an 
inch  apart  on  the  original  drawing, 
a  procedure  which  certainly  cannot 
be  called  "tuning"  the  data. 

M>.  P’ig.  12  shows  the  vastly 
improved  spline  derivative  plot 
which  now  appears  superior  to  both 
the  89-  and  1 6-point  polynomial 
derivatives.  Any  improvements  re¬ 
quired  can  be  obtained  with  the  aid  of  the  aforementioned  insight  about 

4-1*-.  •:  ~  -  1  1  _  S'  4  t - -,.U  _ 

spline  (paragraph  33). 

Summary 

46.  The  authors  have  been  di- 
rec  -ly  involved  in  three  soil- 
structure  interaction  projects  at 
WES: 

a.  R.  E.  Walker  and 
Cheek  (March  1970)  on 
a  dynamic  nonlinear 
elastic  finite  element  method  stress  analysis  program. 

b.  J.  L.  Kirkland  and  Cheek  (April  1970)  on  a  nonlinear  elastic 
incremental  loading  scil-structure  interaction  program. 

c.  FTC  B.  Phillips  and  Radhak r i s h r.an  (1970)  on  a  finite  differ¬ 
ence  solution  of  nonlinear  elastic  analysis  of  ground  motion. 
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Fig.  12.  Comparison  of  spline  and 
polynomial  derivatives  for  16- 
point  fits 


FIRST  INVARIANT  (X) 


Fig.  11.  Comparisons  of  l6-poir.t 
spline  and  polynomial  fits 
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This  experience  has  indicated  that  the  addition  of  splines  to  soil- 
structure  interaction  codes  is  worthwhile.  However,  it  cannot  be  stated 
with  certainty  that  the  computed  results  are  better,  although  there  are 
signs  of  improvement  in  the  stability  of  the  procedure  (such  as  a  reduc¬ 
tion  in  strain  energy  growth),  since  no  known  correct  solution  with  which 
to  compare  the  results  exists.  It  is  felt  that  splines  have  removed  some 
obvious  errors  in  modeling  nonlinear  characteristics  arid  have,  at  the 
least,  allowed  us  to  turn  our  research  to  other  problems. 

S t cady- State  Seepage  Problems 


Confined  seepage 

47.  Producing  a  contour  map  from  a  set  of  data  points 


X.  ,Y. ,Z. 
t’  1 


i  =  1,  ?,  3...H 


has  not  in  general  been  satisfactorily  accomplished.  A  special  case  of 
this  problem  where  the  data  points  occur  on  a  rectangular  grid  has,  how¬ 
ever,  been  solved  using  splines.  Fig.  13  shows  the  data  configuration. 
48  .  To  construct 


a  contour  map,  a  surface 

iuuijL  fxi  ial  lie  fit  led  oo 

the  data  points.  In 
this  section,  a  method 
which  fits  a  simple 
bicubic  spline  to  the 
data  is  described,  and 
an  application  to  con¬ 
fined  steady- state 
seepage  under  a.  weir 
is  discussed. 

to.  The  first  part 
each  row  and  each  column  c 
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COL  Lilli 
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/ 


i 


GRID 

RECTANGLE^ 
0;  "/He 
a'  ! B 


ROW  i~ 


(Ah.  A,.) 


1. 


Fig.  13.  Data  configuration  for  spline 
contour]  v.r 


the  procedure  is  to  fit  a  cubic  spl ine  along 
points 
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Wl.j  !  1  ■  x*  2>  3-"W  :  3  =  i.  2,  3..-3Bax 

where 

X.  =  (j  -  l)h 
Yi  =  (i  -  Dhy 

*  Z.  .  =  Z  value  corresponding  to  (X.,Y.) 

The  cubic  spline  as  defined  on  the  j  interval  of  the  i  row  from 
property  f  of  paragraph  15  is 


!i<x)  =  Th1  (>>i  -  x)3  *  JlJ£rL-  (*  -  x3)3 


(z.  .  -  i  )  -£U — —  +  fe.  .  -  in!H!  h.^- _ • 

\  1,0  6  1,0^/  hx  \1,J+1  o  i,j+l  xj  hx 


1m(H)  u2\X  \Xd 


where  M.  .  is  the  second  derivative  of  the  horizontal  spline  passing 

*  >  J  -fj, 

through  (i,j)  .  Similarly,  the  spline  as  defined  on  the  i  interval 
of  the  column  is 

M(V)  M(V) 

s5(y)  *  (h+i  -  y)3  +  -wf  (y  -  \) 

•  Kj  •  s  ■  t «, ^f-^)  »> 

(V) 

where  M.  .  is  the  second  derivative  of  the  vertical  spline  passing 

*■  y  o 

through  (i,j)  . 

50.  From  these  cubic  splines,  the  simple  bicubic  spline  is  generated. 
In  the  region  (grid  rectangle) 


X.  s  x  s:  X.  . 
0  J+l 


Yi *  * *  Yui 
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.  .  X .  ,  -  x  x  -  X . 

!  ”<*•»>  ■  2  SJ<S>  -V—  *  4  s5+1(y)  —A 


+  5  sj(x) 


Y  -  v 
i+1  J 


+  4  SV 


2  i+1 


y  -  Y 

(X)  -T—i  (6) 
nx 


where  S1’^(x,y)  is  the  bicubic  spline  as  defined  in  the  grid  rectangle 
whose  lower  left-hand  coordinates  are  (X.,Y.)  .  From  this  bicubic  spline, 
the  actual  construction  of  the  contour  lines  can  be  accomplished  as  ex¬ 
plained  in  reference  5- 

51.  A  general  purpose  contouring  program,  documented  in  reference  4, 
which  uses  the  above  method,  has  been  applied  to  several  problems.  One  of 
the  most  important  of  these  is  producing  equipotential  lines  from  steady- 
state  confined  seepage  under  a  weir.  The  problem  is  illustrated  in  fig.  l4, 
It  consists  of  a  weir  resting  on  a  pervious  homogeneous  region  underlain 
by  an  impervious  base.  Seepage  occurs  through  the  pervious  medium  because 
of  the  head  differential  between  its  two  ends.  This  seepage  produces  up¬ 
lift  pressure  on  the  base  of  the  weir,  and  the  engineer  is  interested  in 


hL 


I 

I 

I 

I 

1 _ 
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the  distribution  of  pressures  in  the  pervious  medium.  The  results  are 
normally  expressed  as  contours  of  equal  pressures  or  potentials  called 
equipotentials . 

52.  Region  ABCF  in  fig.  14  is  divided  into  a  grid  system,  and  a 
finite  difference  solution  to  the  governing  partial  differential  equation 
(Laplace’s  equation)- 


2  ? 

*  2  j,  2  u 

dx  oy 


with  boundary  conditions 


h  =  "u 

h'hx. 


on 

on 


AFE 

DCB 


ah 

ay 


=  0  on  £D  and  AB  (ED  and  AB 
flow  across  these  boundaries.) 


are  flow  lines . 


There  is  no 


is  obtained. 


6 


The  output  from  this  solution  is  used  as  input  to  the  con¬ 


touring  program  to  obtain  the  equipotential  lines.  The  results  are  shown 
in  fig.  l4.  The  equipotentials  are  labelled  as  the  percentage  |(h  -  hT)/ 

(hy  -  h^)  x  100  ,  where  hg  is  a  given  equipotential. 

53.  The  quality  of  the  contour  map  can  be  illustrated  in  three  ways: 

a.  The  contour  lines  are  smooth. 

0^  The  ±uwblem  weu>  set  up  such  Inal  the  center  of  the  weir  was 
in  the  center  of  the  region.  One  would  therefore  expect  the 
50%  equipotential  line  to  be  a  line  of  symmetry.  This  is 
indeed  the  case. 

£.  The  boundary  condition  dh/3y  =0  on  ED  and  AB  requires 
that  the  equipotential  lines  intersect  the  line  segments 
orthogonally.  Again,  this  is,  in  fact,  the  situation. 


54.  It  is  concluded  that  the  simple  bicubic  spline  is  an  excellent 
function  for  contouring  data  established  on  a  grid. 

Unconfined  seepage 

55  .  A  problem  of  classic  importance  in  civil  engineering  analysis 
is  that  of  unconfined  flow  in  an  earth  dam,  with  special  interest  given  to 
the  phreatic  surf ace, which  is  a  top  flow  line  across  which  there  is  no 
flow.  Another  property  of  the  phreatic  surface  is  that  the  potential  at 
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any  point  on  the  surface  is  equal  to  the  elevation  head  at  that  point. 

56.  This  problem  is  illustrated  in  fig.  15.  The  phreatic  surface 
(which  is  at  atmospheric  pressure)  is  itself  derived  from  the  flow  equa¬ 
tions  and  must  generally  be  fixed  by"  empirical  or  trial  and  error  schemes. 
The  trial  and  error  procedure  is  described  here. 

57.  For  steady-state  conditions,  the  governing  partial  differential 

equation  is  < 

'2  2 

O  ,  a  h 
2  2 
a*  ay 


with  boundary  conditions 


on 


on 


on 

on 

on 


AB 

AD 

BE 

DCE 

DC 
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The  procedures  used  in  solving  this  problem  are  as  follows: 

a.  Make  an  initial  guess  to  the  phreatic  surface  DC  . 

b.  Solve  the, problem  by  finite  difference  methods  as  if  it  were 
a  confined  seepage  problem  using  dh/dn  =  0  and  P/pg  =  con¬ 
stant  or  DC  (P  =  pressure  ;  p  =  mass  density  of  fluid  ; 

g  =  acceleration  due  to  gravity) . 

'  c.  ■  Adjust  DC  to  satisfy  h  =  y  . 

cl.  Repeat  b  and  £  until  3h/dn  =  0  and  h  «  y  are  satisfied 
on  DC  simultaneously. 

58.  The  principal  difficulty  of  this  procedure  is  in  satisfying 

Q 

3h/dn  =0  on  DC  .  This  is  because  DC  intersects  the  finite  difference 
grid  at  irregular  points,  as  illustrated  in  fig.  15.  Incorporating  splines 
into  the  procedure  may  essentially  eliminate  this  difficulty.  Step  b 
should  then  be  altered  as  follows : 

a.  Fit  a  cubic  spline  along  DC  as  shown  in  fig.  15.  Note 
that  the  points  of  intersection  of  the  phreatic  surface 
and  the  grid  are  used  as  data  points.  Use  the  set  of  equa¬ 
tions  which  solves  for  the  slopes  at  the  data  points  (see 
Appendix  A  for  details),  and  set 

=  -cot  a 

S„  =  -tan  0 
N 

where  N  is  the  number  of  data  points. 

b.  Replace  dh/dn  =  U  and  P/’pg  =  constant  on  DC  with  the 
equivalent  expressions 


i  =  1,  2,  3...N 


where  the  S.'s  are  the  slopes  as  computed  by  the  spline 
fit.  1 

£.  Incorporate  these  simple  formulas  into  the  finite  difference 
solution  to  obtain  the  next  trial  values . 
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PART  TV:  CONCLUSIONS 


59*  The  spline  interpolation  technique  is  a  valuable  tool  in  curve 
fitting  for  computer  programs  for  ■which  more  commonly  used  techniques  are 
unsuitable  or  of  limited  value.  The  spline  technique  has  three  primary 
advantages: 

a.  The  spline  passes  through  all  data  points. 

b.  The  first  and  second  derivatives  of  the  spline  are  continuous 
at  all  points . 

c.  The  spline  can  be  easily  modified  to  satisfy  new  or  addi¬ 
tional  data. 

WES  experience  in  applying  spline  techniques  to  engineering  problems  has 
indicated  that  in  many  applications  these  advantages  outweigh  the  addi¬ 
tional  storage  and/or  computation  time  requirements  of  the  technique. 

60 .  Since  the  spline  function  is  required  to  pass  through  all  data 
points,  erratic  derivative  behavior  may  result  from  experimental  error 
when  the  data  points  are  numerous  and  closely  spaced.  Trial  and  error 
methods  for  smoothing  such  functions  exist,  but  they  are  time  consuming. 

WES  experience  has  indicated  that  acceptable  results  can  generally  be  ob¬ 
tained  by  simply  selecting  a  more  limited,  more  widely  spaced  set  of  the 
data  points  to  which  to  fit  the  curve. 
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APPENDIX  A:  EQUATIONS  FOR  CUBIC  SPLINE 


1.  A  cubic  spline  is  a  function  whose  third,  derivative  is  a  step 
function  with  points  of  discontinuity  at  the  data  points 

X.,Y.  ;  i  =  1,  2,  3...IJ 


where  N  is  the  number  of  data  points.  The  spline  function  of  degree 
three  is  therefore  a  piece-wise,  continuous  third  degree  polynomial  having 
continuous  first  and  second  derivatives. 

2.  For  this  application  the  function  will  be  further  required  to 
interpolate  the  data  points.  Hence,  on  the  i  interval 


X.  s  x  £  X.  . 
l  i+l 

S(x)  =  ai 

where  a.  ,  b.  ,  c.  ,  and  d. 

ill'  i 

Al  can  be  rewritten  as 


;  i  =  1,  2,  3...N  -  1 

+  b.x  +  c.x2  +  d.x3 
ill 

are  constants  to  be  evaluated . 


(Al) 


Equati on 


S(x)  =  a;(x  -  X,)3  +  b’(xi+1  -  x)3 


+  c’(x  -  X.)  +  d-(Xi+1  -  x) 


(A2) 


where  ,  bJ  ,  c^  ,  and  d^  arc  an  alternate  set  of  constants.  Dif¬ 
ferentiating  equation  A2  twice  yields 


S"(x)  =  6a^(x  -  X±)  +  6b^(Xi+1  -  x) 


(A3) 


Applying 


s"(xi) 


M 

i'i . 
1 


s”(xi+i) 


i-t  1 


(A4) 

(A5) 


Al. 
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where  is  the  second  derivative  at  point  (X^,Y^)  >  the  result  is 


3.  Since  the  spline  must  interpolate  the  data  points 


S(x.)  =  Y± 

(A7) 

8(Xi+i)  =  Yi+l 

(a8) 

Substituting  these  into 

equation  A2  produces 

c !  = 

1 

Yi+1  "  ai(Xi+l  "  Xi)  ]  (xi+1  -  X±) 

= 

h+l  -  ~T  (xin  -  xi)  (xl+1  -  X,) 

(A9) 
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Differentiating 
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Mi-l<xi 


-  *i.l>  +  ®l<xi+l  -  *1-1>  +  >wxi*l  -  xi> 


-  gfea :  Yi  ♦  III 

\h+i-h  \  - 


Vi 

xi-i. 


i  =  2,  3,  4...N  -  1 


A  good  choice  for  M  and  is 


(A15) 


“l  "  mN  =  0 


(A1 6) 


This  set  of  simultaneous  linear  equations  can  be  solved  for  the  Mi’s  . 

5.  It  is  sometimes  more  desirable  to  solve  for  first  derivatives 
at  the  data  points  rather  than  second  derivatives.  This  set  of  simulta¬ 
neous  equations  is  derived  as  follows.  Satisfying 


S’(X. )  =  T. 


(A17) 


s’<Vi>  -  Ti+1 


(A18) 


where  T^  is  the  first  derivative  at  (X^Y^),  equation  A12  becomes 

Ti  =  -  T  <xin  -  xi>  *  ^rr  +  |  (Mi  -  >W<xi+i  ’  V 


“in 

T.  =  —  ■  • 
i+1  2 


(xi+i  -  xi> +  r*"  +  5  (Mi  -  Mi+i>(xi+i  ■  V  <A20> 


Collecting  terms 


/  M.  M.a1\  Y...  -  Y. 

-  1  3  “TV  Ui-1  xij  x{ -  x. 

'  i1'!  x 


(A21.) 


T 

i+l 


M.  M,  \  Y.  ,,  -  Y. 

T  *  ^rl  <xi+i  -  xi>  *  3T~ ^ 


(A22) 


a4 
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Solving  for  M.  and  M. ,, 
1  l+l 


Mi  =  i3  X 


i+1 


Y. 

1 


i+1 


X. 

l 


CTi  "  Ti+l/lX. 


-  x. 

1+1  1, 


(A23) 


Mitl  -  ft  ♦  altl  -  3 


(A24) 


Requiring  continuity  of  the  second  derivative  at  data  points  i  -  2,  3, 

U...N  -  1 


s"(xi+)  =  s"(xi-) 


(A25) 


Applying  equations  A12  and  A24 


Y.  -  Y 

3  — iti - 1.  _  2T  -  T 

JX..„  -  X,  ^i  1i+l MX..,  -  X. 


i+1  i 


i+1 

■4 


,  ,  +  2T. 
l-l  l 


-  3  T 


yi  -  Yi-iV  g  ) 

xi  -  xi-iAxi  -  xi-i 


Or 


(A26) 


<Xi+l  '  Xi>Tl-l  +  2<Xi+l  -  Xi-lft  *  ft  -  X.^T 


i+1 


rxi+i  -  M  /x  -  X.  . 

,x7^rr  (Yi  -  W  + '  I'1 


i  1-. 


xi+i  ■  v (Yi+1 " Yi) 


i  =  2,  3,  4...N  -  1 


-  0  becomes 


(A2?) 


Y  -  Y 

2T  +  T  =  3  - - - 

A2  ~  “1 


(A28) 


T  +  2T  =  ^ 

p.r  -> 


Y  -  Y 
N  N-l 


'“-1  "  XH  -  h,-l 


(*29) 


This  system  may  be  solved  for  the  first  derivatives  at  the  data  points  . 


A5 
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APPENDIX  B:  SPLINE  FITTING  AND  INTERPOLATING  SUBROUTINES 

1.  This  appendix  contains  listings  of  the  two  FORTRAN  language  sub¬ 
routines,  SPLINE  (table  Bl^ )  and  SPLINT  (table  B2).  The  following  sections 
describe  the  use  of  the  two  routines. 

Subroutine  SPLINE 


2.  The  SPLINE  subroutine  is  used  to  fit  a  cubic  spline  to  a  set  of 
(x,y)  data.  That  is,  it  calculates  the  moments  at  each  interior  data 
point,  assigning  zero  to  the  moments  at  the  end  points.  Note  that  only 
three  arrays  are  required  for  each  spline,  an  x  ,  a  y  ,  and  a  moment 
array.  The  spline  fit  is  accomplished  by  a  call  statement  in  the  user's 
program  of  the  form: 


CALL  SPLINE  (Al,  A2,  N3,  A4) 


where  the  arguments  Al,  A2,  N3>  and  A4  are  as  follows: 


Argument 


Purpose 


Al  Names  the  independent  variable  array  (x). 

A2  Names  the  dependent  variable  array  (y). 

N3  Specifies  the  number  of  (x,y)  points.  N3  must  be 
in  integer  form. 

A4  Names  the  array  into  which  SPLINE  will  store  the 
calculated  moments . 


3.  Note  that  the  user  must  specify  the  size  of  arrays  Al,  A2,  and 
A4  through  DIMENSION  or  COMMON  statements.  Several  splines  may  be  fit 
and  saved  for  subsequent  use  by  making  successive  calls  to  SPLINE  using 
different  names  for  arguments  Al,  A2,  and  A4  and  indicating  the  number 
of  points  through  argument  N3 • 

4.  Example:  Given  two  sets  of  (x,y)  data,  fit  a  spline  to  each 
set  of  data. 
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5.  Solution:  Let  one  set  of  data  be  in  arrays  XI  and  Y1  having 
N1  points.  Let  the  other  set  of  data  be  in  X2,  Y2  having  N2  points.  The 
following  FORTRAN  statements  are  required,  assuming  that  there  are  no  more 
than  30  points  in  either  data  set. 

DIMENSION  Xl(30),  Yl(30),  Cl(30),  X2(30),  Y2(30),  02(30) 

• 

Statements  to  input  the  two  data 
sets  and  specify  their  siz  3  in 
Ml  and  M2.  (Note  that  Cl  and  C2 
need  not  be  set  to  any  specific 
value . ) 

CALL  SPLINE  (XI,  Yl,  Nl,  Cl) 

CALL  SPLINE  (X2,  Y2,  M2,  C2) 

6.  At  this  point  array  Cl  will  contain  III  moments,  and  array  C2 
will  contain  N2  moments;  the  first  and  last  moment  in  each  array  will  be 
zero. 

7.  Note  that  Nl  need  not  equal.  N2  but  neither  may  be  less  than  2 
nor  greater  than  the  maximum  size  specified  for  the  associated  arrays  ( 30 
in  this  example). 


Subroutine  SPLINT 

8.  Subroutine  SPLINT  (SPLine  INTerpolate)  opeiates  on  a  cubic  spline 
defined  by  the  (x,y)  coordinates  of  N  points  and  the  moment  at  each  of 
those  points.  It  calculates  values  of  y  and  y1  for  any  value  XX  of 
the  independent  variable  x  .  Should  the  value  XX  lie  beyond  the  range 
of  data  defined  by  the  points,  this  program  will  extrapolate  linearly  from 
the  first  (or  last)  using  the  slope  of  the  spline  at  that  end  point. 

9.  Interpolating  is  accomplished  by  a  call  statement  of  the  form: 

CALL  SPLINT  (Al,  A2,  A3,  A4,  A5,  A6,  N?) 


where  the  arguments  A1-A6  and  N7  are  as  follows: 


B2 
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Argument 


Purpose 


A1  Specifies  the  value  XX  of  the  independent  variable 

x  for  which  y  and  y’  are  desired. 

A2  Receives  the  value  of  y  at  XX  computed  by 

SPLINT . 

A3  Receives  the  value  of  y‘  at  XX  computed  by 
SPLINT . 

A4  Names  the  independent  variable  array  of  the  spline. 

A5  Names  the  dependent  variable  array  of  the  spline. 

A 6  Names  the  moment  array  of  the  spline. 

N7  Specifies  the  number  of  (x,y)  points  that  define 
the  spline.  N7  must  be  in  integer  form. 

10.  Note  that  arrays  A4,  A5,  and  A6  must  contain  N7  values  each  of 
x  ,  y  ,  and  moment,  respectively;  i.e.,  they  contain  the  spline  defining 
data.  Argument  A1  is  an  input  argument  for  XX  ,  while  A2  and  A3  receive 
the  values  for  y  and  y*  calculated  by  SPLINT. 

11.  Two  of  \e  variables  in  subroutine  SPLINT  may  be  useful  in  some 
applications.  Variable  FPPXX  contains  the  value  for  y"  .  Variable  M 
indicates  whether  the  computation  was  an  interpolation,  in  which  case  M 
is  zero,  or  an  extrapolation,  in  which  case  M  is  -1.  A  reduction  in  run 
time  would  likely  result  from  incorporating  a  more  sophisticated  search  pro 
cedure  than  that  used  (statement  numbers  100  through  l4o  in  table  B2) . 

12.  Example:  Assuming  that  the  steps  outlined  in  the  example  for 
subroutine  SPLINE  have  been  taken,  the  following  statements  would  calculate 
y  values  (YY)  and  y'  (YPRIME)  at  XX  =  36.49  from  the  second  set  of 
spline  data. 

XX  =  36.49 

CALL  SPLINT  (XX,  YY,  YPRIME ,  X2 ,  Y2,  C2,  N2) 

Test  Program 

13.  Table  B3  shows  a  simple  test  program  and  the  ten  interpolated 
values  computed  using  the  GE  430  Time  Sharing  system  at  the  U.  S.  Army  En¬ 
gineer  Waterways  Experiment  Station. 
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Table  B1 

Subroutine  SPLINE 


1000 

SUBROUTINE  SPLINE  CX,  ZY,  N,  S2) 

1  010C 

SPLINE  FITTING  SUBROUTINE  ADAPTED  FROM 

1020C 

WORK  BY  GREWILLE,  U  S  ARMY  MATH.  RESEARCH  i 

1030C 

UNI V  OF  WISCONSON,  T  S  REPORT  \893, 

1  040C 

JUNE  1968. 

1050 

DIMENSION  XC 1 )>  ZYC1),  S2C  1  ) 

1060 

DATA  EPSLN  /l.E-6/ 

1  070 

N1  =  N  -  1 

1080 

ASSIGN  110  TO  ISW 

10  90 

DO  130  I  ■  1.  Ml 

1  100 

h  =  xc i  +  n  -  xc n 

1  1  10 

DLY  =  CZYCI  +  n  -  ZYC I  )  )  /  H 

1  1 9.0 

GO  TO  ISW,  C110,  100) 

1  130 

100 

H2ZZZ  s  HL  +  H 

1  MO 

S2CI)  =  2.  *  CDLY  -  YL)  /  H^ZZZ 

1  150 

GO  TO  120 

1  160 

1  10 

ASSIGN  100  TO  ISW 

1170 

120 

HL  *  H 

1  180 

YL  =  DLY 

1  190 
1200C 

130 

CONTINUE 

1210 

S2C 1 )  =  0. 

1220 

S2CN)  =  0. 

1230 

OMEGA  =  -  1.0717968 

1240 

140 

ETA  =  0. 

1250 

ASSIGN  170  TO  ISWl 

1260 

DO  190  I  =  1,  N1 

12  70 

H  *  XC I  +  1 )  -  XC I ) 

1280 

DLY  =  CZYCI  +  1)  -  ZYC I ) )  /  H 

1  290 

GU  TO  ISWl,  U7U,  ISU) 

1  300 

150 

H2ZZZ  =  HL  +  H 

1  310 

81  =  .5  *  HL  /  H2ZZZ 

1320 

W  =  CBI  *  S2CI  -  1)  +  C . 5  -  BI )  *  S2C I  *  1 

1330% 

3.  *  CYL  -  DLY)  /  H2ZZZ )  *  OMEGA 

1340 

S2C I )  =  S2C I  )  ♦  W 

1  350 

Z  =  ABSCW) 

1360 

IF  CZ  -  ETA)  180,  180,  160 

13  70 

160 

ETA  =  Z 

1  380 

BETA  =  S2C I  )  -  W 

1  390 

GO  TO  180 

MOO 

170 

ASSIGN  ISO  TO  ISWl 

1  410 

180 

HL  =  H 

1420 

YL  =  DLY 

1430 

190 

CONTINUE 

1440 

IF  C ABSC BETA )  *  EPSLN  -  ETA)  MO,  140,  200 

1450 

200 

CONTINUE 

1460 

RETURN 

1470 

END 

CENT 


S2C  I 
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Table  B2 

Subroutine  SPLINT 


1520 

SUBROUTINE  SPLINT  (XB,  F XX/  FPXX,  X, 

ZY, 

S2»  N 

153GC 

SPLINE  INTERPOLATING  SUBROUTINE  ,  BY 

JAY 

CHEEK 

1  540 

DIMENSION  XC  1  )  , 

ZY (  1  > ,  S2C  1  ) 

1  550 

MM  =  0 

1560 

XP  =  XB 

1570 

1  =  1 

1580 

if  exp  -  x< n ) 

100,  170,  110 

1  S90 

100 

MM  =  -  1 

1600 

xp  =  xc  n 

1  610 

GO  TO  170 

1620 

110 

IF  (XP  -  X(N )  ) 

130,  150,  140 

1630 

120 

IF  (XP  -  X( I  )  > 

160,  170,  130 

1640 

130 

1  =  1  +  1 

1  650 

GO  TO  120 

1660 

140 

MM  =  -  1 

1.670 

XP  «  XCN) 

1680 

150 

I  =  N 

1  690 

160 

1  =  1-1 

1700 

170 

HT1  =  XP  -  XC  I  ) 

. 

1710 

HT2  =  XP  -  XCI 

+  1  > 

1720 

PROD  =  HT1  *  HT2 

1730 

DX  =  XC I  +  1  )  - 

XT  I  > 

1  74) 

DEL.Y  =  (ZYCI  ♦ 

1 )  -  ZY  C I ) )  /  DX 

1750 

S3  =  CS2CI  +  1) 

-  S2C I  )  )  /  DX 

1760 

FPPXX  =  S2CI  )  + 

HT1  *  S3 

1  770 

DELSQS  =  CS2CI) 

+  S2CI  +  l)  +  FPPXX) 

/  6 

• 

1780 

FXX  =  ZY Cl)  +  HT1  *  DELY  +  PROD  *  DELSQS 

1790 

FPXX  =  DELY  +  CHT1  +  HT2 )  *  DELSQS  + 

PROD  *  S3 

1800 

IF  (MM.EQ.O)  GO 

TO  180 

18  10 

FXX  =  FXX  +  FPXX  *  C  XB  -  XP> 

1  820 

180 

CONTINUE 

1830 

RETURN 

1840 

END 

READY 
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Table  B3 

Test  Program  for  SPLINE  and  SPLINT  Subroutines 


1010  DIMENSION  XC10).  V(10)»  C(10) 

1020C  SET  THE  TEST  DATA. 

1030  X(l)  =  1.6 

1 040  Y<1)  =  1. 

1050  XC  2)  =  5.4 

1060  YC2)  =  2. 

1070  XC3>  =  7. 

1080  Y( 3 )  =  1. 

1090  XC4)  =  8.2 

1100  Y( 4 )  =  1. 

1110  NUMB  =  4 

1120C 

1130C  FIT  A  SPLINE  THROUGH  THE  X.Y  DATA  POINTS. 

1140  CALL  SPLINE  (X,  Y,  NUMB.  C) 

1  150C 

1160C  INTERPOLATE  AT  INTERVALS  OF  ONE. 

1 170  PRINT  300 

1180  DO  100  1=1.  10 

1190  XI  =  I 

1200  CALL  SPLINT  CXI,  YY.  YP.  X.  Y.  C.  NUMB > 

1210  100  PRINT  200.  XI.  YY.  YP 
1220  STOP 

1230  200  FORMAT  C 3X  3E20-9) 

1240  300  FORMAT  C//10X  1HX.  19X  1 HY .  19X  7HY  PRIME  / > 
1250  END 

READY 

RUN 


.1.1.1  08:31  WES  05/19/71 


X 


Y 


Y  PRIME 


0. 100000000E  +  01 
0.200000000E+01 
0.300000000E+01 
0.400000000E+01 
0.500000000E+01 
0.600000000E+01 
0. 700000000 E +01 
0.800000000E+01 
0.900000000E+01 
0. 100000000E+02 

STOP 


0.606953327E+00 
0. 1 26029407E+0 1 
0. 184263327E+01 
0.219698582E+01 
0.216050413E+01 
0. 16091 7 1 68E+01 
0. 1 OOOOOOOOE+O 1 
0.967082546E+00 
0. 1 13543 181 E+0 l 
0. 130472158E+01 


0*65507 7 788E+00 
0. 642049980E+00 
0.4954871 39E+00 
0. 186076697E+00 
- .28618 1346E+00 
-•727131570E+00 
-•338579S30E+00 
0.  155  1 8  228  5E  +  00 
0. 1 6928976 5L *00 
0.  169289765E.  +  00 


RUNNING  TIME:: 


3.2  SECS  I/O  TIME 


.4  SECS 
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A  CORRECTING  PROCESS  DESIGNED  TO  CONTROL  PROPAGATED  ERROR 


Isaac  S.  Metts,  Jr. 

Walter  Reed  Army  Institute  of  Research 
Walter  Reed  Army  Medical  Center 
Washington,  D.  C. 


Abstract ,  A  correcting  process  which  allows  the  differential  equation 
being  considered  to  choose  the  corrector  to  be  used  at  each  step  is  formed 
by  controlling  the  propagation  coefficients.  It  is  investigated  for 
stability  and  effect  on  propagated  error  and  numerical  results  are  given 
to  support  its  validity. 

1,  Introduction.  In  this  paper  we  shall  introduce  a  new  family  of 
corrector  formulas  and  a  new  correcting  process  specifically  designed  to 
control  the  growth  of  propagated  error.  We  will  restrict  our  attention 

to  the  solving  of  differential  equations  of  the  form  y*  =  f(x,y),  yQ  *  y(xQ). 

This  paper  is  related  to  a  paper  (P-1173)  submitted  to  ’’Mathematics  of 
Computation"  by  Professor  James  R.  Wesson  of  Vanderbilt  University  in 
early  1967. 

In  part  two  of  the  paper  we  will  present  background  information  in 
the  same  format  as  followed  by  many  papers  in  this  area.  One  such  paper 
which  provided  particularly  helpful  guidance  was  presented  by  T.  E.  Hall 
and  A.  C.  R.  Newberry  [5], 

2.  Background  Information.  We  a  e  concerned  with  correctors  which 
are  members  of  the  family  of  "closed"  formulas  of  the  form 


(1) 


y  . ,  -Ay  -  A  ,y  A  ,y  , 

n+1  n  n  n-1  n-1  n-k  n-k 


h[B  +  B  y’  +  ...  +  B  y '  ]  +  R  .  , 

n+1  Jn+1  n7n  n-k  Jb-kJ  n+1 

s  t 

where  R  ,,  is  the  truncation  error  for  the  (n+1)  application  of  the 
n+1 

corrector  and  h  is  the  step  size  for  equally  spaced  data. 


To  insure  uniform  lines  of  thought,  we  give  several  definitions. 
Associated  with  (1)  are  two  defining  polynomials 

\  k+1  .  k  .  k-1  . 

(2)  p(s)  =  s  -  A  s  -As  -  •  ••  ~  A„  l. 

n  n— 1  n-k 

and 


Preceding  page  blank 
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(3)  o(r)  =  B  , rk+1  +  B  rk 

n+1  n 

Definition  3,  For  a  formula  to  be 
stable  requires 

(a)  all  roots  of  (2)  be  within  or 

(b)  those  roots  lying  on  the  unit 


Dahlquist  stable  or  Asymptotically 

on  the  unit  circle,  and 

circle  must  have  multiplicity  one-  [ 2 J  « 


It  should  be  noted  that  this  is  not  necessarily  the  condition  of 
stability  which  leads  to  the  most  favorable  propagation  of  error.  This  is 
illustrated  by  the  fact  that  the  well-known  corrector,  Simpson's  Rule,  is 
Dahlquist  stable  but  propagates  error  most  unfavorably.  We  are  therefore 
more  interested  in  two  other  variations  of  this  idea  of  stability  which 
concern  themselves  with  the  roots  of 

(4)  t  (r )  =  Q- -KB  ,1)rk+1  -  (A +ICB  )rk  -  ...  -  (A  ,  +  KB 

n+1  n  n  n-k  n-k 

where  K  =  h  (we  will  assume  this  partial  always  exists).  We  see  from 

[5]  that  if  restrictions  are  placed  on  the  corrector  coefficients  to  assure 
that  the  formula  is  of  reasonably  high  degree,  one  root  of  (4),  call  it  r 

K  ^ 

(also  called  the  principal  root),  is  approximated  by  e  .  The  other  k  roots 
are  extraneous  roots  which  have  been  introduced  by  approximating  the  roots 
of  a  first  order  differential  equation  by  those  of  a  (k+l)st  order 
difference  equation. 

Definition  2,  A  corrector  formula  is  said  to  be  relatively  stable  if 
if  lrJ  c  lrJ  for  i  =  ? ,  3 , . . . ,  k+1  [4,  7,  8], 

Definition  3.  A  corrector  formula  is  said  to  be  absolutely  stable 
if  |r~]  <  1  for  i  =  2,  3,  ...,  k+1  [4,  7J.  ~  ”  ~~  ~~ 

By  the  propagated  error  of  a  formula  we  mean  the  difference  between 

the  n  computed  value  of  y  of  a  differential  equation  and  the  ntk  exact 

■  n 

value  Z  and  denote  it  by  Z  -  y  =  «.  .  Using  this  notation  for  oropagnled 
n  n  nn  r 

error,  R  for  truncation  error,  and  K  =  h  — ,  we  state  the  propagation 
equation  for  (1)  as 


4 
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(5) 


•  •  • 
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The  propagation  equation  of  (6)  is 


(9) 


where 


£  =  he  +  3c  ,  +  yc  „  +  6c  - 

n+1  n  n-1  n-2  n-3 

R  . . 
n+1 


(-54KA3  -  16KA2  -  38KAX  -  448A  +  1440)* 


(10)  a(K,A1  A2,A3)  = 

A3(1440  -  756K)  -  64KA2  -  21210^  +  2048K 
-54KA3  -  16KA2  -  38KAX  -  448K  +  1440 


8(K,A1,A2,A3)  = 

-1296KA3  +  (1440  -  384K)A2  +  528KA  +  768K, 

-54KA3  -  16KA2  -  38KAX  -  448K  +1440  * 

y(k,a^,a2,a3)  — 

-1836KA3  -  1984KA2  +  (1440  -  1292K)A1  +  2048K 
-54KA3  -  16KA2  -  38KA3  -  448K  +  1440 

<s(k,a1,a2>a3)  = 

(-1440  -  486KA3  +  (-1440  -  464K)A2  +  (-1440  -  502K)A1 

-54KA3  -  16KA2  -  38KA1  -  448K  +  1440 

_ 1440  +  448K _ 

~54KA3  -  16KA2  -  38KA:  -  448K  +  1440 

For  the  remainder  of  this  paper,  the  restriction  |k|  .4,  which  is 

supported  in  the  literature  [3,  p.  198],  will  be  assumed. 


4,  Development  of  the  Correcting  Process.  In  developing  our  sub¬ 
family  of  (6),  we  are  interested  primarily  in  minimizing,  or  at  least 
controlling,  growth  of  propagated  error  while  at  the  same  time  insuring 
stability.  Since  in  the  classical  approach  propagated  error  is  considered 
to  be  controlled  by  the  roots  of  the  characteristic  equation  of  the 
propagation  equation,  we  have  decided  to  try  to  control  these  roots  by 
controlling  the  propagation  coefficients  (10).  In  this  paper  we  have 
required  that  they  be  equal  at  each  step.  An  extension  we  hope  to  make 
in  a  later  paper  is  to  make  a  different  but  theoretically  feasible 
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restriction  on  the  propagation  coefficients  and  compare  the  two  procedures, 
Theorem  1.  If  we  require 

^(k,a1,a2,a3)  -  b(k,a1,a2,a3)  -  y(k,a1,a2,a3)  -  «(k,a1,a2,a3), 


Then  >,  :i,  >,  :,  A^t  A 3  all  become  functions  of  K  and 

(11)  <(K)  =  (K)  =  y(K)  =  <S(K)  = 

12K4  +  50K3  +  1Q5K2  +  12QK  +  60 

8K4  -  50K3  +  120K2  -  120K  +  240 


al(k)  = 


a2(k) 


a3(k)  = 


3 

-56K 

- 

20K2  - 

-  10K  +  72 

-20  KJ 

+ 

121K2 

-  49K 

+ 

288 

72K3  - 

■  24K2  + 

48K  + 

72 

-20K3 

+ 

121K2 

-  49K 

+ 

288 

3 

-72K 

+ 

84K2  - 

-  214K 

+ 

11 

-20K3 

+ 

121K2 

-  49K 

4- 

~  • 

288 

Proof ;  The  proof  consists  of  equating  the  propagation  coefficients 
pairwise  and  solving  the  resulting  systi  a  of  linear  equations. 

Corollary  1.1.  If  } K )  <  .4,  then  a(K)  is  a  continuous,  increasing 
function  of  K  and  <  a (K) <  ~  . 

From  these  results  we  see  that  we  can  control  the  corrector  being  used  by 
monitoring  K  =  h  ~  , 


5,  Results  on  Stability  and  Error  Propagation. 

Theorem  2.  If  the  propagation  coefficients  (10)  are  required  to 
be  equal  a'nd  jK|  <  .4,  then  the  correctors  of  the  resulting  subfamily 
satisfy 

(1)  conditions  for  absolute  stability, 

(2)  conditions  for  relative  stability, 

(3)  conditions  for  Dahlquist  stability. 
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Proof :  Since  conditions  (1)  and  (2)  are  concerned  with  roots  of 

the  characteristic  equation  of  (9)  with  a=8=Y=4 

(12)  t(r)  =  r4  -  a(K)r3  -  a(K)r2  -  a(K)r  -  a  (K) , 

we  combine  our  investigations  of  these  conditions.  We  first  find  the 
resolvent  cubic  of  (12) 

x3  +  x2  +  ~  [a2  (K)  +  4a(K)]x  +  [3a2(K)  +  a3(K)]. 

Using  this  we  find  that  the  discriminant  of  (12)  is 

D  =-16a6(K)  -  88a3 (K)  -  203a4(K)  -  256a3(K). 

Since  a(K)  >  0,  we  have  D  <  0  and  therefore  (12)  has  two  real  and  two 
conjugate  imaginary  roots.  We  call  r^  the  principal  real  root  and 

r^,  r^  the  conjugate  imaginary  roots.  These  roots  satisfy 
-a(K)  =  r1.  r2.  r^.r^ 

and 


With  this  background,  our  proof  of  absolute  and  relative  stability  is 
achieved  by  refining  Inequalities  sufficiently  to  locate  the  above  roots. 

In  satisfying  condition  (3),  we  are  concerned  with  the  roots  of 

(13)  o(r)  =  r4  -  A^r3  -  A^2  -  AjT  -  AQ. 

Since  1  -  A^  -  A^  -  A^  -  A^  =  0,  we  have  the  principal  root  of  (13)  is 
r^  =  1,  Factoring  we  obtain 

(14)  P*(r)  =  r3  +  (1  -  A3)r2  +  (1  -  A3  -  A^r  F  AQ. 

It  is  not  difficult  to  show  that  (14)  has  one  negative  real  root  r^  and 
two  conjugate  imaginary  roots  r^,  r^,  Also  r^.r^.r^  =  -Aq  and 

|  -Aq 

| r3 1  =  )r^|  zj-  -•  .  We  again  refine  inequalities  to  locate  the  roots. 
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With  this  result,  we  have  shown  that  the  correctors  used  throughout 
our  process  will  be  numerically  stable.  In  the  paper  references  above, 
Hull  and  Newberry  gave  as  a  condition  for  reducing  growth  of  propagated 

error,  maximizing  while  minimizing  truncation  constant  En+^.  In  our 

4  4 

process,  )'  B.  becomes  an  increasing  function  of  K  and  1.89  <  ^B.  <  3.2 
i=0  1  i=01 

while  i  '  v'  .0159.  This  compares  favorably  with  the  fifth  order  Adams' 

4 

corrector  which  has  !e  .1  ;  .0187  and  J"  B.  =  1.  If  we  considered  the 

n+1  i-0  1 


propagation  coefficients  to  be  fixed  and  equal  throughout  the  process  (as 
is  the  classical  approach),  the  idea  that  the  propagated  error  is  dominated 
by  the  principal  root  of  the  propagation  equation  is  equivalent  to  assuming 

that  the  error  grows  geometrically,  i.e.,  t  ^  =  (l+r)n^,  where  is 

the  initial  propagated  error  and  Jrj  <1.  Under  the  same  conditions  of  equal 
and  fixed  propagation  coefficients  (call  them  a)  with  the  additional  restric¬ 
tions  of  f.  s  t  .  z  e  r,  z  £.  s  3  ~  £.  es  cr  and  truncation  error 

n  n-1  n-d  h  J  2  1  0  I 

£ 

T  =  r  fixed  at  each  step,  the  idea  of  dominance  of  principal  root 
4 


(r  re)  yields 


n+1  »  cE[3a  +  (;t-l)eK  +  (a-l)e2K  +  («-l)e3K] 

+  (n-nK[4e3K  +  3e2K  +  2<K  +  ,,  +  r  "jj4  _ 

i-0 


S  t 

We  now  obtain  a  general  expression  for  the  n+1  propagated  error  of  our 
system  by  applying  methods  from  Calculus  of  Finite  Differences  by  Charles 
Jordan  [6,  p.  587] 


ea 
(15) 


theorem  i.  it  we  require  propagation  coefficients  to  be  equal  at 

ch  step,  but  allow  change  from  step  to  step  as  previously  described, 

(n+1) -4  n+1 

T  C  (K  ,,,m)  T.  .  +  F  v(K  x  r: ,  . . .  , 

n+1’  (n+1) -m  h  .  „  n+l,m)  (n+l)-m 

m=0  m=(n+l)-3 


with 


n+1 


v(Kn+i,m 


)  =  7  a.  (K  ina,  (K  .  >*. 

“  J,  n+1)  j,  n+l-j  3 


) a .  (K  .  .  s 

n+l-Jj^-3  2).' 


1  J3 


lj.(’Kn+l-m+j1)  , 
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where 


It  should  be  noted  that  in  the  language  of  combinatorial  analysis, 
the  number  of  terms  in  the  coefficient  of  *s  precisely  the 

number  of  compositions  of  m  with  no  restriction  on  the  number  of  parts 
but  with  no  part  greater  than  4.  This  is  denoted  c(m,4). 


Corollary  3.1.  If  the  initial  propagated  errors 


e3  '  '  2 


T.  ...  =  T  at  each  step  and 

(n+l)-ra 


“1  ‘'0 

Clm  =  max^(Kn+1)»  -*  . a^Kn+1-fm-n^» 

then 


(n+1) -4  .j  n+1 

En+1ldTl  I  c(m,4)«  +  |c|  I  <=(“•*>“.  ■ 

m=0  m=(n+l)-3 


m 


where  j  is  the  smallest  Integer  satisfying  j 


We  obtain  another  general  bound  for  c  ,  by  applying  a  theorem  of 

n-ri  /  n  2 


,  ^  nr 

Hadamard  [1]  which  states  that  def.  A  <  tt  r .  where  r.  =/  > 

-  1  1  v  w 


aik'  and 


and  A  =  a,.  is  an  nXn  matrix. 

1  ik1 


Corollary  3.2.  If  the  initial  propagation  errors 

it  each  i 

(n+1) -4  * 


en  =c,=e„=e_=e  and  T,  =  T  at  each  step,  then 

0123  (n+l)-m 


f.  |  | T |  [1  +  3a  +  2*2  +  2a J  +  £  2a  .  (l-4-2a)m  4(n/3a). 


*3 


*,  ra-4 . 


n+1 


m=4 


n+1 


(1+/2*  )  .  (1-ta)]  +  |  e  j  2*.  (1+2* )m  4  .  (l+/3a)  (l4/2a)  (1+  a  J 

ra= (n+1) -3 


where  a  =  max  {a(Kn+^),  a(Kn)»  •••  >  a(K^)}  • 


Many  other  bounds  may  be  found  with  additional  restrictions  on  K  and 
the  differential  equation  being  considered. 


5.  Numerical  Results.  In  order  to  test  our  process  in  actual  numerical 
calculations,  we  have  chosen  four  differential  equations  with  initial 
conditions.  These  choices  are  supported  in  the  literature  [3,  p,209]  and 
are  designed  to  test  our  process  for  stability  and  for  solving  higher 
order  equations  and  systems  of  equations.  In  order  to  give  some  idea  of  the 
value  of  our  process,  we  will  compare  its  results  with  results  obtained 
by  using  the  Adams’  corrector  alone.  The  Adams'  formulas  are  widely 
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recognized  as  standards  by  which  the  stability  properties  of  other  processes 
can  be  measured.  We  compare  the  processes  by  comparing  the  two  quantities 
error  and  relative  error.  Letting  z(m)  denote  the  m*"  exact  value  of  y, 
we  define 


Error  =  |y(m)  -  z(m) 


and 


Relative  Error  = 


Error 

z(m) 


Many  references  consider  relative  error  to  be  more  meaningful  than  error 
since  it  takes  into  consideration  the  si2e  of  the  value  being  calculated. 
Tables  1  through  8  give  values  for  error  and  relative  error  for  each 
equation  for  stepsizes  h  =  0.05  and  h  =  0.01.  The  next  four  tables  give 
a  comparison  of  stepwise  change  of  relative  error  and  the  final  two  tables 
give  comparisons  of  the  growth  of  relative  error  over  the  entire  range  of 
the  calculations. 


Although  our  results  were  obtained  to  15  significant  figures,  we  have 
rounded  the  figures  in  cur  tables  to  2  or  4  places  whichever  is  more 
meaningful  in  each  case.  We  denote  our  process  with  equal  propagation 
coefficients  by  EPC.  To  make  the  process  more  realistically  useful,  we 

3  f 

have  designed  our  computer  program  to  approximate  h  - —  at  each  step  of  the 

d  y 

calculations  instead  of  feeding  in  the  exact  values  for  this  quantity 


(.5142  X  10 


-5 


is  denoted  .5142"05). 


The  author  wishes  to  express  his  sincere  appreciation  to  Professor 
James  R.  Wesson  for  his  valuable  guidance  during  this  study.  Much  of 
this  work  W3s  supported  by  NASA  contract  NAS  8-2559. 
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TABLE  1 

COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y'  =  xy  AT  h  =  O.05. 


1.00 

2.00 

3.00 

4.00 

5.00 

6.00 

7.00 

8.00 

9.00 


ERROR 

RELATIVE  ERROR 

ERROR 

RELATIVE  ERROR 

.7484“°7 

.^539"°7 

.25^7"°6 

.1345“°6 

•2577‘°5 

.3488“°6 

.924l"03 

.I25l’°5 

.1657“°3 

.l84l"°5 

.6l50"°3 

.6832”03 

.2204"01 

.7394~°5 

,8j427~°1 

.2827"014 

.645l+01 

.2404“0i| 

.255^0? 

.9435’021 

.b3bf0h 

.662 0~0h 

,1747+03 

.266l"03 

.6976+OT 

•1597’03 

,2867+°8 

.6565"03 

•  375711 

.3464"03 

.1149+12 

.1453"02 

.2672+15 

.6884" 03 

.ll46+l6 

.2955  '02 

.659^+19 

-02 

.1272 

•5577-02 

10.00 


TABLE  2 

COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y'  =  xy  AT  h  =  O.Oi 


EPC  ADAMS' 


X 

ERROR 

RELATIVE  ERROR 

ERROR 

RELATIVE  ERROR 

1.00 

.2606"10 

.158l‘10 

•9797'10 

.59li2-10 

2.00 

.92h0~°9 

.1250"°9 

.3907-08 

.  W°9 

5.00 

.62 66 “°7 

.696l"09 

.2395"06 

. 2661 “°8 

U.00 

.88 24 ~05 

.2960“08 

.339^'021 

.1138"07 

5.00 

,275y"09 

.1020_°7 

.1059"01 

•3945"07 

6.oo 

•1957+01 

.298l“07 

,7607+01 

•1158-06 

7.00 

.?358+04 

.7642”07 

.1303+°5 

.2984'06 

8.00 

•139?+°8 

.17 6h~06 

.5^62+08 

•6917'06 

9.00 

,1^50+12 

.3735-06 

•5709+1? 

.U71*05 

L0.00 

•3820+l6 

•7368”06 

.1510+1T 

.2913“05 

10.00 


TABLE  3 

COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  yf  =  -xy  AT  h  =  0.03 

EPC  ADAMS ' 


X 

ERROR 

RELATIVE  ERROR 

ERROR 

RELATIVE  ERROR 

1.00 

.1676-08 

.2764“08 

.86s-r08 

.l427‘°7 

2.00 

•519?-08 

.2361 -°7 

.7237-07 

.Q139"07 

3-00 

.1185“°8 

.1067”06 

.3838"08 

.71135"08 

4.00 

•1177-09 

.3  tcrr-06 

.2985”^ 

.8B98"06 

5.00 

.2974"10 

.7980"°^ 

.93 86" 10 

.2519*"°^ 

6.00 

•7273"12 

.i,776'011  ! 

.2363 -1] 

.1552"°^ 

7.00 

.kWlh 

.1897'05 

•  l437”1":> 

.6278“°^ 

8.00 

.7613"17 

.60H"0-’ 

.?».77-16 

-02 

.2033 

9.00 

.h?hl~20 

.1646“02 

.lUB^-19 

.5763-02 

10.00 

.7d‘)3~2h 

J.074"02 

.2921 

.lc;l4"01 

TABLE  h 


COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y'  =  _Xy  AT  li  -  0.01 
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TABLE  5 


COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y'  =  -2xy2  AT  h  =  0.0'3 


EPC  ADAMS' 


X 

ERROR 

RELATIVE  ERROR 

r  — — *  —  ■ 

ERROR 

RELATIVE  ERROR 

1.00 

.L972"07 

.9945-07 

.2118"06 

•  i.237-06 

2.00 

.38Lo"°9 

•  19»-08 

.3508"08 

.179L'07 

3.00 

.322l“°9 

•3221-08 

.1613"08 

.l6l3"07 

L.00 

.1913’09 

•3252-08 

.8701"09 

.lhJ9~07 

5.00 

.9983"10 

.2596"08 

.4432"09 

.1152”07 

o .  00 

.5399" 0 

•199S-08 

.2;-7h-r'‘ 

co 

c 

1 

-3 

CO 

D- 

CC 

7.00 

.3101"10 

•1550-08 

.1557-04 

.6784” 08 

8.00 

.1885"10 

.1225"08 

.8227‘10 

.53^8'°8 

9.00 

.1204"10 

•9877~°9 

.52U8'10 

.1,70J-08 

10.00 

.8025”11 

.8l06"°9 

. 

.3493"1C 

.3 52 8“ 08 
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TABLE  6 


COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y’  =  -2xy~  AT  h  =  O.Ol 


EPC  ADAMS’ 


X 

ERROR 

■ 

RELATIVE  ERROR 

ERROR 

— 

RELATIVE  ERROR 

1.00 

.8937"11 

.1787'10 

.363l‘10 

.7261-10 

0 . 00 

.6837-12 

•3U19-11 

.1223 "10 

o 

O 

• 

y\ 

-1  0 

,10b0 

.10/40"11 

.3685"12 

.3685"11 

i*.oo 

.11 65  ”1 5 

.1980 “12 

.3398'15 

.6117"12 

o 

o 

S' 

.853s"15 

.2219”13 

.5120-Iit 

.1331"12 

A. 00 

.7393"15 

.r.sc-Vl 

.2731 ~1? 

7.00 

.165/*  "Il; 

.84l8'13 

.S060"3l‘ 

.233 O-12 

A.  00 

.1319_lU 

.8r)75'15 

.2820~3^ 

.l833"”12 

7.00 

•913 l“i5 

.730L’13 

.I2l0-llt 

.  9922  ”1 

10.00 

.672 8"15 

-IS 

.1700  1 

.1717"15 

TABLE  7 

COMPARISON  OF  EPC  AND  ADAMS’  FORMULAS 


FOR  y’’  =  -(xy ’ +y),/(xy )s  AT  h=0.03 
EPC  ADAMS ’ 


y 

ERROR- 

- - 

RELATIVE  ERROR 

ERROR 

RELATIVE  ERROR 

.00 

.-03 

.  !'n  1 

-09 

Jt033  • 

0 

1 

CT\ 

On 

c 

1.00 

-Qii 

•  51 9  v 

.-OL 
.  1996 

.l/*32"°? 

.9796"0'1 

TABLE  8 


COMPARISON  OF  EPC  AND  ADAMS'  FORMULAS 
FOR  y"  =  -(xy'+y)  (xy)-'  AT  h  =  r’.Oi 


EPC  ADAMS' 


X 

ERROR 

RELATIVE  ERROR 

ERROR 

RETjATTVE  error 

4.00 

.323 -r07 

.i6or)’07 

.]Oon-°7 

.8f.vr08 

19.00 

.1  <sr~of' 

.7!>l  8-nv 

.]0,)rnr 

.5941 "°7 

TABLE  9 

COMPARISON  OF  STEPWISE  GROWTH  OF  RETjATTVE 
ERROR  OF  EPC  AND  ADAMS'  FORMULAS  FOP  ,v'  =  xy. 


h  =  0.05  h  =  0.01 


X 

EPC 

- ■■  ■■■■  — 

ADAMS' 

r— . 

EPC 

ADAMS' 

0 

0 

CJ 

1 

0 

0 

r-i 

7.68 

8.10 

7.91 

7.99 

2.00-5.00 

5.28 

5.46 

5.57 

5.0,1 

5 . 00-4 . 00 

4.0? 

4 .14 

4.25 

4  .28 

4.00-5.00 

5.25 

5.54 

’’  •  4  r' 

5.47 

5.00-8.00 

2.75 

2.8? 

p  „  op 

?.  04 

6.00-7-00 

S.41 

2.47 

• 

O  r-  Q 

7.00-8.00 

2.17 

0 ,22 

2.5] 

8.00-Q.00 

1 .99 

9.00-10.00 

i .  m 

1  .Ro 

]  .0  • 

'!  .  0  P 
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TABLE  10 


COMPARISON  OF  STEPWISE  GROWTH  OF  RELATIVE 
ERROR  OF  EPG  AND  ADAMS'  FORMULAS  FOR  y»  =  -xy 

_ h  =  0.03 _ h  =  0.01 


X 

EPC 

ADAMS’ 

EPC 

ADAMS f 

1.00-2.00 

8.91* 

6.1*0 

1*  .90 

l*.68 

2 . 00-3 . 00 

1+.52 

3.78 

1*  .86 

lt.1+0 

3 . 00-1* .  00 

3.29 

2.38 

3.06 

2.90 

1*. 00-5. 00 

22.78 

28.31 

21.1+3 

22.38 

3.00-6.00 

3-99 

6.l6 

3.60 

9.62 

6.00-7.00 

3. 97 

1*  .05 

3.70 

3.70 

7.00-8.00 

3.17 

3.2U 

2.93 

2.93 

8.00-0.00 

2.1b 

2.87 

2.81 

2.51 

9.00-10.00 

2.1*8 

2.67 

2.2b 

2.2l* 

TABLE  11 


COMPARISON  OF  STEPWISE  GROWTH  OF  RELATIVE 
ERROR  OF  EPC  AND  ADAMS'  FORMULAS  FOR  y'  =  -2xy? 

_ h  =  0.03 _ h  =  O.Ol _ 


X 

— 

EPC 

ADAMS' 

EPC 

r 

ADAMS' 

1 .00-2.00 

.  0193 

.01+11+ 

.1917 

.1681* 

.  .00-3.00 

1.6776 

.9196 

.701*2 

.3013 

5 . 00-1+ .  00 

1 . 0096 

.9169 

.1901+ 

.1660 

1+. 00-5. 00 

.7983 

.7789 

.1121 

.2176 

0.00-6.00 

.7696 

.7625 

3.1+218 

2.0318 

.00-7.00 

.  '77'-'8 

.7727 

1  ]  riQ'7 

.  9?6J+ 

•'.00-8.00 

.7903 

.788-. 

1.0187 

.721*9 

•1.00-9.00 

.  8063 

.801+6 

.8701 

.91+13 

-y.  00-10. 00 

b 

Cvj 

CO 

•  8199 

_ 

.  8966 

.1771 

TABLE  10 

GROWTH  OF  RELATIVE  ERROR  FROM  x  =  1.00  TO 
x  =  10.00  FOR  EPC  AND  ADAMS'  FORMULAS 


h  =  *' 

+  .09 

h  = 

0.01 

EPC 

EPC 

ADAMS’ 

>1 ,7 

.2802+0'; 

.  76]  0+°^ 

.  +0* ' 
.1+9+0 

-XV 

.ll+7l++07 

.1061 + 07 

.‘S0l6+°A 

. U of 6+ 

\  P 

-?xy 

.ai'-r°° 

.8727-02 

.7v6*r^ 
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TABLE  13 


GROWTH  OF  RELATIVE  ERROR  OF  EPC  AND  ADAMS’ 
FORMULAS  FROM  x  =  A. 00  TO  x  =  19.00  FOR 
y' '  =  -(xy'+y)/ (xy)p 

h  =  0.03 _ h  a  0.01 


EPC 

ADAMS' 

EPC 

ADAMS' 

.L850+01 

.496o+01 

JM3+01 

.!i78ci+01 

L8] 


ITERATIVE  SOLUTION  OF  ABSTRACT  POLYNOMIAL  EQUATIONS 

L.  B.  Rail* 

Mathematics  Research  Center 
University  of  Wisconsin 
Madison,  Wisconsin 

ABSTRACT.  The  nonlinearities  in  many  equations  of  practical  impor¬ 
tance  are  polynomial  in  character,  viewed  from  the  abstract  standpoint 
of  multilinear  operators  in  Banach  spaces.  After  appropriate  definitions 
and  theorems  on  the  properties  of  such  operators,  a  standard  class  of 
abstract  polynomial  equations  is  identified.  For  equations  in  this  class, 
existence  and  uniqueness  of  solutions  may  be  established  by  simple  calcu¬ 
lations  with  appropriate  scalar  majorant  polynomials.  These  results  also 
give  conditions  for  the  convergence  of  the  method  of  successive  substi¬ 
tution  (simple  iteration),  Newton’s  method,  and  the  modified  form  of 
Newton's  method  to  a  solution,  including  numerical  values  for  error 
estimates.  For  the  class  of  abstract  polynomial  equations  considered, 
the  method  of  successive  substitutions  and  the  modified  form  of  Newton's 
method  are  shown  to  be  identical. 

The  theory  is  illustrated  by  application  to  finite  polynomial  systems, 
and  to  particular  polynomial  integral  and  differential  equations. 

1.  INTRODUCTION.  The  nonlinear  operator  equations  considered  in 
this  paper  are  natural  generalizations  of  scalar  polynomial  equations  to 
the  more  abstract  setting  of  Banach  spaces.  This  class  of  abstract  poly¬ 
nomial  equations  includes  a  number  of  interesting  differential  and  integral 
equations,  which  contain  nonlinearities  consisting  of  powers  or  products 
of  the  unknown  functions,  mingled  perhaps  with  linear  differential  or 
integral  operators.  A  theory  of  abstract  polynomial  equations  of  com¬ 
pleteness  comparable  to  the  theory  of  scalar  polynomial  equations  would 
be  highly  desirable,  but  is  not  available  at  the  present  time.  After 
framing  suitable  definitions,  the  present  work  will  proceed  to  the  con¬ 
struction  of  local  theories  for  a  specific  class  of  abstract  polynomial 
equations,  the  ones  which  will  be  called  regular  at  some  point  Xq  of 

the  Banach  space  X,  These  local  theories  will  give  information  concerning 
the  existence,  uniqueness,  and  construction  of  a  solution  in  a  neighborhood 
of  Xq.  The  methods  used  are  based  on  the  well-known  theorems  for  the 

solution  of  operator  equations  by  the  technique  of  successive  substitutions 
(simple  iteration)  and  by  Newton's  method  [13].  The  requisite  facts  for 
the  abstract  polynomial  equation  are  deduced  from  the  corresponding  result 
for  a  scalar  majorant  polynomial,  which  can  be  obtained  by  fairly  simple 
direct  calculations. _ 

■k 
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2.  Definitions  and  notation.  Let  X  and  Y  denote  linear  spaces  (not 


necessarily  distinct)  over  a  common  scalar  field  A  ,  which  will  be  restricted 
to  be  the  real  or  complex  number  system.  The  space  of  linear  (additive  and 
homogeneous)  operators  L  from  X  into  Y  will  be  denoted  by  £(X,Y)  .  A 
linear  operator  B  from  X  into  £(X,Y)  is  called  a  bilinear  operator  from  X 
into  Y  ,  for 

(2.1)  BX1X2  =  (BX1)X2 

is  an  element  of  Y  for  all  Xj,x  c  X  .  It  follows  that  the  bilinear  form  (2. 1 ) 
is  linear  in  each  argumei.'  For  ^  =  x2  =  x  ,  the  nonlinear  mapping  from  X 
into  Y  defined  by 

(2.  2)  y  =  Bxx 

is  a  simple  generalization  of  multiplying  the  square  of  a  scalar  variable  by  a 
constant  coefficient  [llh  By  a  simple  induction  [13, pp.  100-108],  one  may 
define  multilinear  operators  and  polynomial  operators  of  arbitrary  degree, 
for  n  -  2,  3, ... ,  let  £(Xn,Y)  denote  the  set  of  linear  operators  N  from  X 

n_j 

into  X  (X  ,Y)  .  The  operator  N  will  be  called  an  n-linear  operator  from  X 
into  Y  . 

If  Ni  £(Xn,Y),  and  n  points  x^x^  ...,x  tX  are  given,  then 

(2.  3)  y  =  NX.X,.  .  .X 

1  2  n 

v'‘ii  be  a  point  of  Y  ,  the  convention  being  that  N  operates  on  Xj  ,  the 

(n- 1 )—  linear  operator  NXj  operates  on  x2  ,  and  soon.  In  general,  the  order 

of  operation  is  important.  For  a  permutation  i  =  (i  i . .  .  ,i  )  of  the  integers 

i  m  n 

1,  2, .  .  . ,  n  ,  the  notation  N(t)  will  be  used  for  the  n-iinear  operator  from  X 
into  Y  such  that 
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(2.4) 


N(i)x.x  ...x  =  Nx  x  ...x 
12  n  l2  n 

for  all  Xj,  x^, . . .  ,x^  t  X  . 

Thus,  there  are  nf  n-linear  operators  N(t)  associated  with  a  given 

n-linear  operator  N  . 

An  n-linear  operator  N  from  X  into  Y  is  said  to  be  symmetric  if 

(2.5)  N  =  N(i) 

for  all  l  c  n  ,  where  n  denotes  the  set  of  all  permutations  of  the  integers 
n  n 

l,  2, . . . ,  n  .  The  symmetric  n-linear  operator 

(2.6)  Z  N<1> 

i  e  n 

n 

is  called  the  mean  of  N  . 

An  alternative  definition  of  symmetry  of  an  n-linear  operator  N  would  be 
to  require  that  N  =  N  .  A  simple  nonlinear  operator  from  X  into  Y  is  obtained 
by  taking  =  . . .  =  =  x  in  (2.4).  The  result  is  an  obvious  general¬ 

ization  of  the  product  of  the  nth  power  of  a  scalar  variable  and  a  constant 
coefficient. 

It  will  be  convenient  at  times  to  use  the  notation 
(2.  7)  NX  =  NX.  .  .  X  , 

m  <  n  ,  N  <  £(Xn,Y)  ,  for  the  result  of  applying  N  to  x  c  X  m  times. 

If  m  <  n  ,  then  (2.  7)  will  represent  an  (n-m)-linear  operator  from  X 

into  Y  .  For  the  special  case  m  =  n  ,  note  that 

(2.8)  Nxn  =  Nx11  =  N(t)xn 

for  all  i.  t  n  ,  x  «  X  . 
n 
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For  A.  «  ZQil,Y)  ,  i  =1,2,  ...,n  and  AQ  «  Y  ,  the  operator  P  defined 

by 

(2.  9)  P(x)  =  Anxn  +  . . .  +  A2x2  +  AjX  +  Aq 

is  called  an  abstract  polynomial  operator  of  degree  n  from  X  into  Y  .  The 

equation 

(2.  10)  P(x)  =  0 

is  called  an  abstract  polynomial  equation  of  degree  ri  . 

It  follows  from  (2.  8)  that  the  multilinear  operators  A_,  A., . . . ,  A  in 

2  j  n 

(2.  19)  may  be  assumed  to  be  symmetric  without  loss  of  generality,  since  each 
A.  in  (2.  9)  may  be  replaced  by  A^  ,  i  =  2,  3, . . . ,  n  ,  without  changing  the 
value  of  P(x)  .  Unless  the  contrary  is  explicitly  stated,  the  multilinear  oper¬ 
ators  in  all  abstract  polynomials  considered  henceforth  will  be  assumed  to  be 

symmetric. 

The  operator 

(2.11)  P(m)(x)  =  n(n-l)...  (n-m+l)Anxn"m  + 

.  ...  .  .  . .  n-m-1  . . 

+  (n-1  )(n-2). . .  (n-m-2  A  ,x  +...  +m!A 

'  '  n-1  m 

is  called  the  rnth  derivative  of  the  abstract  polynomial  operator  P,  m  =  1,  2, .  . . ,  n 
Note  that  P^(x)  <  £(Xm,Y)  for  m  =1,2,  ...,n,  and  that  P"(x)  , 

P"'(x), .  .  .  ,  P^(x)  are  symmetric  multilinear  operators.  The  computation  of  P(x) 
and  its  derivatives  at  a  point  x  =  xQ  may  be  accomplished  by  adapting  Horner's 
algorithm  for  scalar  polynomials  to  this  purpose  [13,  p.  111].  An  algebraic 
formulation  of  this  algorithm  may  be  obtained  by  setting 
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(2.  12) 


A(0)  -  A 

Ai  “  V 


A^  =  A 
n  n 


i  =  0, 1, . . . ,  n  , 


j  =1,2,...,  n+1  , 


and  calculating 


(2.  13) 


A(j  +  1)  -A(j+1>  x  +A(j) 
n-k  “  n-k+1  0+  n-k’ 

j  =  0, 1 , . . . ,  n-1  ;  k  =  1,  2, . . . ,  n-j 


The  results  of  this  calculation  are 


(2.  14) 


A(j+n  -i-P0)(x  ) 

Ai  n  *  o' ’ 


j  =0,  l,...,n,  the  notation  Pv  (xQ)  being  used  for  P(xQ)  . 
Remark  2.1.  Taylor's  identity 


(2.  1  S) 


P(x)  =  P(xQ)  +  P'{x0){x-xQ)  +  }p"(x0)(x-x0)  + 
+  ...  +^rp<n|(x0)(x-x0)n 


holds  at  any  xQ  <  X  [13,  p.  111]. 

An  abstract  polynomial  operator  P  is  said  to  be  regular  at  xQ  if  the 
linear  operator  P'(x0)  is  one-to-one  and  onto  Y  ,  so  that  the  (left)  inverse 
[P'(x  )|  1  exists. 


If  P  is  regular  at  xQ  ,  then  one  may  set 

f  Dt  =  (il)"1[P'(x0)r1P(i)(x0)  , 


( 2 .  16] 


h  . 

b0  =[p'<x0)r‘p(x0), 


obtain  che  abstract  polynomial  in  X  , 


i  *  1,  2, . . . ,  n  , 


(2.  1  7) 


R(h)  =  B  hn  +. ..+  B  h^  +  h  +  B, 
n  2  i 
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problem  of  solving  the  polynomial  equation  P(x)  =  0  is  equivalent  to 
>;i  of  the  polynomial  equation 

R(h)  =  0  , 

•  > ; hat  P(x)  is  regular  at  xQ  . 

i ii'i.eoforth,  attention  will  be  restricted  to  polynomial  equations  of  the  form 

.  i. 

'oy.is  for  polynomial  operators  in  Banach  spaces.  Henceforth,  X  and  Y 
;  '  assumed  to  be  Banach  spaces  over  A  ,  that  is,  complete  normed  linear 
•s.  Hi  nee  confusion  is  unlikely,  the  norm  in  either  space  will  be  denoted 
ii  .  Considering  only  bounded  operators,  the  spaces  £(Xn,Y)  ,  n  =  1,2, .. 
ilsu  be  Banach  spaces  [7]  for  the  norm 

II  Nil  =  sup  ||  Nx  ||  . 

llxll  =1 

,i.;  j  n  -  I  ,  N  will  simply  be  a  linear  operator  from  X  into  Y  . 

;  .  nXn,Y)  and  m<n,  then 

II  Nxm  ||  <  ||  N  ||  *  II  x  ||  m  . 

.\;‘i  abstract  polynomial  operator  P  from  X  into  Y  of  degree  n  defined  by 
P(x)  =Anxn  +...  +  A2x2  +  Alx  +  A0, 

.■  be  hou ndod  if  its  coefficients  A,  ,  i  =  1,  2, .  . . ,  n  ,  are  bounded 
.  ;im  ir  operators  from  X  into  Y  . 

at  >  II  A.  ||  , 

,  .  .  .  ,  n  ,  the  real  polynomial 

p(r)  =  =,/  +...+  a2r2+aIr+ao 
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is  called  a  scalar  maiorant  polynomial  for  the  bounded  abstract  polynomial 
operator  (3.  3). 

If  (3.  5)  is  a  scalar  majorant  polynomial  for  the  operator  defined  by  (3.  3), 
then  for  ||x|j  <  r  , 

0-61  ||  P (x )  |i  <  p(r)  , 

(3.7)  l|P(i)(x)||  <p(1)(r)  ,  i  =1,2 . . 

and 

(3.8)  l|P(n+j)(x)||  =p(n+J)(r)  =0,  j  =1,2 . 

The  formal  differentiation  process  defined  in  §  2  when  applied  to  bounded 

abstract  polynomial  operators  yields  their  Frechet  derivatives  [13, pp.  108-115]. 
In  general,  the  mth  Frechet  derivative  of  an  operator  F  from  X  into  Y  will  be 
bounded,  symmetric  m-linear  operator  from  X  into  Y  [7,  11].  For  operators 
which  have  a  continuous  mth  Frechet  derivative,  the  following  estimate  holds: 

m-1  . 

(3-9)  l|F(x)  -  £  rrF  K  (x  )(x -x.r II  < 

k  =  0  u 

sup  UF<m)(x)«  llx-xlP, 

X  €  [xQ,x] 

where 

(3.  io)  [*0,x]  =  {x :  x  =  ex  +  (i  -  e)xQ,  o  <  e  <  i } 

is  the  line  segment  from  xQ  to  x  [13,  p.  125;  6, 8]. 

Theorem  3.  1.  If  P(x)  is  a  bounded  abstract  polynomial  operator,  and 
p(r)  a  scalar  majorant  polynomial  for  it,  then 


m-1 


(m). 


(3.  11) 


PM  -  2 


m 


k  =0 


where 

(3.  12)  R  =  max{  || x || ,  II xQ || }  . 

Proof:  This  follows  directly  from  (3.  6)-(3.  9),  and  the  fact  that 
(3.  1  3)  ||x ||  =  ||  0x  +  (1  -  0)xol|  <  0R  +  (1  -  6)R  =  R  . 

In  the  following  sections,  results  obtained  by  considering  scalar  majorant 
polynomials  will  be  used  to  survey  various  iterative  methods  for  solving  abstract 
polynomial  equations. 

4 .  Solution  of  abstract  polynomial  equations  by  successive  substitutions.  It 
will  be  assumed  that  the  bounded  abstract  polynomial  operator  P  is  regular 
at  some  «  X  ,  and  the  transformations  (2. 16)  have  been  performed  to  put 
the  polynomial  equation 

(4.1)  P(x)  =  0 
into  the  form 

(4.2)  Bnhn  +. ..+  B2h2  +  h  +  Bq  =  0 

in  the  space  X  .  Ordinarily,  one  would  be  motivated  to  choose  xQ  as  an  initial 

>:<  >i< 

approximation  to  a  solution  x  of  (4.  1),  and  then  seek  a  solution  h  =  h  of 
(4.  2)  in  the  vicinity  of  the  origin  0  of  X  .  Of  course,  the  finding  of  a  suitable 
xQ  may  be  a  significant  problem  in  its  own  right.  A  general  prescription  for  finding 
x0  would  be  equivalent  to  a  complete  solution  of  the  program  of  solving  abstract 
polynomial  equations,  and  is  not  available  at  the  present  time.  However,  one 
usually  has  some  useful  information  about  the  specific  problem  under  consideration, 
or  can  find  xQ  by  some  approximation  procedure.  For  example,  by  choosing  a 
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•  >o.;.s  x  ,  x  (or  a  finite-dimensional  subspace  X  of  X  (which 

itsel:  no  of  inute  dimension  ->  m  )  ,  one  rnay  try  to  fine,  constants 
c-,,o  such  that  xA  =  o,x,  +  a  x  +.  .  .  +  a  x  minimises  j|  P(x)  li  , 

1  parnups  hotter,  11  (  P(k)  ]  P  (x )  ||  .  If  m  is  sufficiently  small,  this  problem 
;:.ay  be  computationally  tractable.  In  the  following,  it  will  be  assumed  that 
n  ^  2  ,  so  that  (4.  2)  will  be  nonlinear. 

One  commonly  used  technique  for  the  soiution  of  equations  is  the  method 
of  successive  substitutions  (or  iteration).  Equation  (4.  2)  is  written  in  the  form 


(4.  3)  h  =  F(h)  , 

where 

(4.4)  P(h)  -  -(Bnhn  +.  .  .  +  B2h2+  BQ)  . 

Solutions  of  (4.  3)  are  called  fixed  points  ot  the  operator  F  .  The  form  of  (4.3) 
suggests  the  iteration 

(,K  ^  h0  r  0  >  h-v.  , ,  "  F(h^)  , 

P-  =0,1,2,...  .  If  the  seuuenco  {h  i  eo.-v/orgos  to  h  =h  ,  then  h  will 


be  a  solution  of  (4.  3),  and  hence  of  (4.  2).  The  point 


(4.  (>) 


x  -Vh 


will  then  satisfy  the  original  polynomial  equation  <4 .  1). 

mi  ..muons  for  the  convergence  .  :  t>  i  :•  process  ai  °  given  by  a 
theorem  of  Banach  [  1  ’  and  Caccioppoii  j 

Theorem  4.1.  If  a  non-negative  constant  M  <  1  exists  such  that 
(-1  •  V  i  il  F  (:■: )  -  F  (v  » il  p  u  |i  x  -  y  |j 

.or  .ni  x,y  i  U(r)  ,  where 


i  r 


1  {r !  -  ■  >•  :  il  X  il  .1  r )  , 


[  '•*  I 


a 


I 


(4  .  H ) 


and 

(•1.9) 

where 


r  > 


~  1  -a  ’ 


lhv-hk!i  <7^-  nn  , 


1-JJL  ‘0 


(i.io)  n0  =  iih1-h0l|  =  II bq El  , 

then  the  sequence  { }  generated  by  (4.  5)  converges  to  a  fixed  point  h  of 
1’  which  is  unique  in  U(r)  ,  with 
(4.  1  1) 

k  =  0,  1,  2,  . . .  . 

This  theorem  is  simple  to  prove  [13],  and  gives  information  concerning 
the  existence  of  a  solution  h  of  (4.  3)  near  the  origin  0  of  X  ,  and  the 
error  bound  (4.11)  for  the  terms  of  the  sequence  (h,  }  as  approximations  to 
h*  . 

Tor  the  present  purposes,  it  is  required  to  obtain  a  bound  ^  for  the 

operator  defined  by  (4.4).  If 

(4.  1  2)  b,  ||  B  li  , 

i  —  i 

then  F(h)  has  the  scalar  majorant  polynomial 
(4.  13  ) 

for  h  c  U(r)  .  Similarly, 

(4.M) 

if  h  t  U(r)  .  By  Theorem  3.1  (see  inequality  (3,11)), 

(4.1*)  li  F(x)  —  F  (y )  ||  <  f*(r)||x-y|| 

for  x,  y  i  U (r )  .  From  (4.  14),  f ' (0)  -  0  ,  and  for  r  >  0  ,  f'(r)  is  a  positive, 
strictly  monotone  increasing  function  which  goes  to  infinity  as  r  -*  +  cc 


i  —  2,3,.«.,n, 


f(r)  =  b  rn  +  .  .  .  +  br2  +  n 
n  2  0 


I F*  (h )  ||  <  f'(r)  -  nb  r°  1  +.  .  .  +  2b  r 
n  2 


Lot  r  =  K  be  such  that 


(4.  16) 


f‘(R)  =  1  . 


Then,  for  0  <  r  <  R  ,  the  function 


(4.  17) 


r0(r)  =  1  -f'(r) 


is  positive  (assuming  >  0  ),  strictly  convex  and  monotone  increasing, 
and  goes  to  +  °o  as  r-*R, 

Under  the  conditions  given,  the  curve  defined  by  (4.  17)  will  intersect 

the  line  r.  =  r  in  at  most  two  points  r  ,  r  ,  which  may  be  coincident.  These 
0  e  u 


points  are  determined  by  the  equation 


(4. 18) 

or,  by  (4.  14), 
(4.  19) 


r  = 


'0 


l-f’(r)  » 


nb  rn  +.  .  .  +  2b_r2  -r  +  ri  =  0  . 
n  2  0 


By  Descartes'  rule  of  signs  [5],  (4. 19)  has  two  or  no  positive  solutions,  which 

establishes  the  assertion  rm  de  above.  If  r  ,  r  exist,  r  <r  ,  then  it  is 

evident  that  (4.  9)  is  satisfied  for  r  such  that 

(4.  20)  r  <  r  <’  r  . 

e  —  —  u 

The  values  of  r^,  may  be  determined  as  accurately  as  necessary  by  solving 

the  simple  scalar  polynomial  equation  (4.  19).  This  gives  the  following  result. 

Theorem  4.  2.  If  positive  solutions  r  <  r  of  equation  (4.  19)  exist. 

e  ~  u  ’ 

then  a  solution  h  c  U{r  )  of  equation  (4.  31  exists  and  is  unique  in  U (r  )  , 

e  u 

Furthermore,  the  sequence  {h^}  defined  by  (4.  5)  converges  to  h*"  ,  witn 


i-i.ii) 


,h  -hk"  ) 


k  =  0,  1,  2,  .  .  .  . 
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This  result  gives  a  simple,  computationally  verifiable  condition  for  the 
existence  and  uniqueness  of  a  solution  of  the  abstract  polynomial  equation 

(4. 1)  in  the  vicinity  of  a  point  x^  at  which  P(x)  is  regular. 

For  (4.  2)  quadratic  (n  =  2)  ,  equation  (4. 19)  becomes 

(4.22)  2b2-r  +  nQ=0, 
which  has  real  solutions  if  and  only  if 

(4.23)  i-8b2n0>0, 
or 

<4-24>  K11  ■ 

5.  Solution  of  abstract  polynomial  equations  by  Newton's  method.  Another 
frequently  used  approach  to  the  solution  of  nonlinear  operator  equations  is 
Newton' s  method  in  the  generality  obtained  by  L.  V.  Kantorovic  [8,  13,  14].  For 

(5.1)  R(h)  =  Bnhn  +. .  .+  B2h2+h  +  BQ  , 
one  has  the  Frdchet  derivatives 

(3.  2)  R'(h)  =  nBnhn‘l  4.  .  .  +  2B2h  +  I  , 

where  I  denotes  the  identity  operator  In  X  ,  and 

(5.  3)  R"(h)  =  n(n-l)Bnhn"2  +.  .  .  +  2B  . 

>]< 

Newton's  method  for  finding  a  solution  h  of 

(5.4)  R(h)  =  0 

in  the  vicinity  of  hQ  =  0  consists  of  formation  of  the  sequence  {h^.}  defined 

by 

(5.5)  h0  =  0  ,  \+l=hk-tR'(hk)r‘R(hk). 
k  =  0,  1,  2,  .  .  .  ,  if  this  is  possible. 
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Since 


or  is  unique  in 


Pr>  ofG  of  this  v  'com  may  be  found  in  the  literature  {6,  10,  13,  14  J. 

hi  the  present  cane, 

(-•.It)  f"(r)  =  n(n-l)bnrn  2  +.  .  .  +  6b^r  +  2b  „ 

is  a  scalar  majorant  polynomial  for  R”(h)  (see  (4.  12)-(4.14)). 

Here,  one  looks  for  intersections  of  the  curve 
(5.  16)  r0  =n0f"(r) 


with  the  curve 
(5. 17) 


2Vr~  V 
2 


which  will  be  solutions  of  the  equation 

(5.18)  r“(r)r2- 2r-i-2ri0  =  0  , 

or,  from  (5.  15), 

(5.  19)  n(n-l)b^rn  +.  .  .  +  2b ^r2  -  2r  +  2r|Q  =  0  . 

By  Descartes'  rule  of  signs,  equation  (5.  19)  has  two  positive  solutions 
r  <  r  or  none.  If  r  ,  r  exist,  then  it  is  evident  that  r|„  <  r  <  2r>  , 

and  that  (5.  1 1)— (5.  12)  are  satisfied  with 


(5.20)  K=K^=f"(r). 

v,  fc? 

Also,  if  r^  >  2n0  ,  then  (5.  14)  will  be  satisfied  with 

(5.  21)  K  =  K  =  f" (r  )  . 

u  u 


This  establishes  the  following  result. 

Theorem  5.2.  If  positive  solutions  r  <  r  of  equation  (5.  19)  exist, 

>;«  — • 

then  equation  (5.4)  has  a  solution  h  <  U(r  )  ,  to  which  the  sequence  {h,  } 

* 

uciincd  by  (5.  5)  converges  to  h  ,  with 


r 


llh*  -hjji  i e 


■>.  22) 


-2  <p  sinh  <p 
e  _ _e 

.  .  _k 
sinh  2  <p 

e 


V. 


7--- — - —  =  1  +  cosh  ip 

f  (V\ 
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k  ~  0,  l,  2, . . .  .  If  >  2Hq  ,  then  h  is  unique  in  U(r^)  •  If  ru  =  2r)Q  , 
then  h  is  unique  in  1/(2^)  . 

Since  the  polynomial  on  the  left  side  of  (5. 19)  is  positive  for  r  =  0  ,  and 
<;oes  to  +  ^  as  r  —  +  «5,  a  simple  sufficient  condition  for  the  existence  of 
r  ,  r^  is  that  it  be  nonpositive  at  r  =  2r|Q  .  This  gives  the  following  theorem. 
Theorem  5.  3.  If 

(S.  23)  n(n-l)2n  1  +.  .  .  +  4b2r|0  <  1  , 

then  a  solution  h  =  h  of  the  abstract  polynomial  equation  (5.4)  exists  and  is 
unique  in  U (2r|^)  ,  and  the  sequence  {h^}  defined  by  (5.  5)  converges  to  h  . 

Applying  condition  (5.  23)  to  quadratic  equations,  one  obtains 
(5.24)  4b2n0  <  1  , 


or 

(5.  25) 


0"  4  |B. 


Comparison  of  (5.  25)  with  (4.  24)  seems  to  indicate  that  Newton’  method  is 
far  superior  to  the  method  of  successive  substitutions  for  the  solution  of 
abstract  quadratic  equations.  Actually,  this  is  an  illusion,  as  will  be  shown 
in  the  next  section. 

Condition  (5.  23)  may  also  be  applied  to  scalar  polynomial  equations, 
i’or  example,  if 


(5.  26) 


R(h) 


one  has 
(5.  27) 


a  nd 


1 

100  ' 


50 


__1_ 

10  '  ^0 


1 


•> 


9 


]  97 
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o,  1,2,...  ,  where 

(6.  5)  r(h)  =  -(Bnhn  +.  .  .  +  B^h2  I  B{))  . 

Id-mark  6.  1.  The  modified  form  of  Newton's  method  for  the  solution  of  the 
m.. /tract  polynomial  equation  (6.  1)  is  identical  to  the  method  of  successive 
su.jStilnLionc  (4.  ) . 

This  follows  immediately  by  comparison  of  (6.  M  with  (4.4).  Consequently, 
an  alternative  condition  for  the  convergence  of  the  method  of  successive 
substitutions  is  the  following  theorem  [14  ]  on  the  convergence  of  the  modified 
form  of  Newton's  method. 

Theorem  6.  )  (Kantorovic-Tapia). 


(6.6) 
for 
(6.  7) 
If 

(6.8) 


f (r )  ,  and  that 


1  R  "  (h )  H  <  K 


Kh0  <  a  • 


i  -  2Kn 


r  >  r  - 
—  e 


0 


then  a  solution  h  of  (6.  1)  exists  in  U(r  )  ,  and  the  sequence  {h,j  defined 

-!< 

by  (6.4)  converges  to  h  ,  with 

r 


(6.  V) 


ilh  -hk 11  -  fr('-  ^m'i 


ok  =  lilt 


1-N/rrZKTT/  fIk-l>  °0=1  » 


1,  2,  3,  ...  .  If 

(-•.  10) 


Pk  '  ^k-l  +KV  P1  "  °  ’ 

1  -  •s/i  -  zKn  A 


r  >  r  = 
~  u 


K 


ii.cn  h  is  unique  in  U(r^)  . 
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i.,  evident  that  the  hypotheses  of  Theorem  n.  I  are  satisfied  if  equation 

(•■.19)  ...is  positive  solutions  r  ,  r  such  that 

e’  u 

(«-,  i  o  r  •  J-1  •; .  . 


no  ..»>v  then  take 


(•».  1  1) 

in  (h.  9),  and 
(«'•■  1  3) 
in  (6.  lu). 

Theorem  .6.  2.  If 

(6.  1  4 ) 


K  -  f"  (r 

e  o 


K  -  f"  (r  ) 
u  u 


n(n-l)2  bnn()  +...  +  9  1  , 


taon  solutions  r  ,  r  of  equation  (5.  14 )  exist  and  mtisfy  (6.11). 
o’  u 

C 'ons.oquently,  a  solution  h  of  equation  (b.  1)  e\  .Is  m  U(r  )  ,  and  is 

0 

unique  in  U(ru)  •  Furthermore,  the  sequence  (h,  defined  by  (6.4) 


■  .nvercj es  to  h  ,  with 


0. 


r  "y  <  .  .  \  Vr  4 1 

lih'  ~h,||  <  v;;uy,  ^  ]j 


lb. 15) 


°k  =  A1  *  1  -\l-2f,r(r 


Vl.  V1  ’ 


V. 


^-l+f,,,re,rV  “ 


Proof.  The  condition  (6.  14)  implies  that  the  Indicia  1  polynomial 

(it.  ii>) 

i  .  iuve  at  r  -  2n,  ,  which  means  that  positive  solutions  r  ,  r  of 

0  o  u 


\(r)  -  n(n-l)b  r1  t...4  2b  r  -  2r  +  2q 
n  2  o 


i  ’ , .  .  .  i 


2  CO 


X(rl  n 


. ...  •  aviation  (•'.  i‘/l)  *  l,  :>  1  i i> '■(■')  ‘  1  ’  :  \  (:  ■  !  '  •  :  •' 

..  tm  r::.  re,  (6.11)  is  satistu'd,  so  the*  coiyluj-ions  ot  tin  *  1 1 •  (>  !  !•  .v.r  i 
,■ .  •.-re...  1. 

>r k  o.  1.  It  follows  Li. at  (6.  11)  ( ind  also  (•  .  .'.■'ii  w 1 1 i  i  salislien 
ally  small,  say  r  *"  n  ,  Thi..  means  trial  the  s-.hhion  h  h 

. ...itie...  (6.  1)  depends  continuously  on  B(j  in  the  rm  Hjhiionmod  of  B  -  6  , 

,  -  0  satisfies  the  homogeneous  equation 


(  <>.  ib)  it  hn  K  .  .  +  U  ,h  +  h  =  0  , 

n  c 

y- 

lor  any  t  >  0  ,  a  solution  h  -  h  c >f  (6.  1)  exists  such  that 

•  i  ;j  ‘f’  ti 

i  1'.' )  ll h  ||  :_  < 

ivivlud 

(’■■  ■t.))  ii  Bq  H  i.  g  - 

A.,  noted  above,  ( *1.  19)  and  (5.19)  provide  alternative  conditions  for  il 
•  ..vi  rj«.  nee  of  the  me  tin  •*!  {  -i .  6)  .  jf  r.ucf  suive  substitutions.  Aetna  Hy.  ( 


van i ton  in  b  e  form. 


hi) 


N.  (  r) 


n(  n - 1 ) 


b  r"  +. 


.  +  b  ,r  --  r  + 


--  0 


vi  11  o.  die  integral  multiplier:;  of  the  coefficients  av  divisible  by 

•-)  6  (r)  s  nb  rn  +...  i  Zb.a  -  r  •  v  , 

n  n  i.  ()  ’ 


i  <  i 


i  )  N  (  r)  ,  r  •-  g  , 

'i  1 1 

a  =  d,  i,  but  for  u  >  i  ,  It  !.  il,h  So.;  y  ( r)  ,,  (  r) 

n  n 


.a  1 

a  (  r )  —  ,i  (  r)  v  ~  'nr  + .  .  i  .. , .•  .  i . 

n  a  a  n  T 


■  mix  *  for  r  ..a . .  f .  e.  •  ally 


,\i  ...  .if.  t>.  j.  c  naif;.  ..  (  ••>.  !  ’)  gives  better  estimates  than  (4.  19)  of  the 
the  metawb  .  f  sucre save  substitutions  for  quadratic  and  cubic 
a.  1:1 ...  If  b,  -  o  ,  tnen  the  situation  is  reversed  for  equations  of  degree 
.  r  *  t  r  < . ;  .  1 1  r. 


'stems  of  no] 


’-.■■nua  1  equations.  An  important  special  case  for  the 


•  a'.- fly:-. is  riven  above  occurs  if  X  -  Rm  ,  the  space  of  n-dimensional  real  vectors 
-  '-a  •  •  •  >  Sm>  •  0n<-  could  equally  well  consider  X  =  Cm  ,  the  space  of 

n-dimensional  complex  vectors  z  =  {  ^ . .  . ,  t,  )  ,  but  ii  is  usual  in  compu¬ 
tational  practice  to  embed  complex  problems  in  real  spaces  with  twice  the  dimension 
of  the  complex  space. 

T  r,ITl  . 

iU  K  ,  an  n- linear  operator  may  be  represented  by  an  array 


(  T  .  i ) 


o  ~  (  v.  .  .  .  )  , 


•  __  i  }  c  nf i  , 

,  •  ■  •  •  >  Jn  -  1  >  •  •  •  *  -s  ‘  mi  sting  of  m  element..'; .  The  following 

'.'or, ii.. n  will  be  adopted  for  ;no  formation  of  the  (n-i  )-linear  operator  Nx  : 


\'x  - 


1JiJ2***  Vl’n 


a  Ji 


( f  ? 

t  ..j .  • 


...  h  j 


s',  i r.N  i  . 


e  condition  for  the  operator  N  defined  by  (7.  1) 


'-i»e  ,s  that 


i  ■  •  o 


V/  -  v> 

1  J .  J  >  ■  •  •  )n  1 J,  J,  . .  .  j, 

*  “  a  k  k  k 

1  Z  r 


is  any  peimutatmn  >f  the  m- 

KZ  n 


'  L  ) i  i  ! t >  •  •  •  i  i  <.  rn  .  Thus,  the  number  of 
t,  i  z  n 


>i  ,  > 


th.  nr.ct  cKmcr.tn  of  a  symmetric  n-linear  operator  (7.1)  is  significantly  smaller 
than  for  an  arbitrary  n-linear  operator. 

One  important  source  of  symmetric  n-linear  operators  ir  R1  is  indicated 
in  the  following  remark. 

Remark  7.2.  If  the  operator  F  in  Rm  defined  by 

F(x)  Mfjlx),  f2(x) . fjx,)  , 

where 

(7-5)  fi(x>  =  fi‘h’h . V  - 

...,m  ,  is  differentiable  n  times,  then  its  nth  (Frochet)  derivative 

at  x  -  x  -  f£(0)  P(0)  f<°K  .  , 

X  -  xQ  -  ,  .  .  .  ,  )  is  the  symmetric  n-linear  operator 


(7.6) 


r)nf. 


\ 


F^n\x  )  = 

'VV"\; 

1  n  /  x  =  x 


0 


b  jj,  J2,  . .  . ,  Jn  =  i,  2,  . .  .  ,  m  . 


In  R  ,  abstract  polynomial  equations  are  evidently  systems  of  m  alge¬ 
braic  polynomial  equations  in  the  rn  unknowns  f-  >'  .  The  systems 

rn 

cun  arise  directly  in  applications,  or  be  approximations  to  equations  involving 
operators  I  (x)  in  Rn  which  have  power  series  expansions  at  x  -  ;< 

Another  source  of  finite  polynomial  systems  is  the  discretization  of  polynomial 
operator  equations  in  infinite-dimensional  spaces.  This  is  usually  done  in  the 
emse  of  differential  equations  by  using  finite  differences  as  approximations  to 
ueuvu lives.  For  integral  equations,  corresponding  finite  systems  are  often 
obtained  by  the  use  of  a  numerical  integration  rule  whirl,  replaces  integrals  by 
1  mite  sums.  Finite  polynomial  systems  may  also  be  obtained  by  taking  a  seg- 
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•’••<r.t  of  on  infinite  system,  or  by  other  approximation  techniques  applied  to 
<-  ;;iutions  in  infinite  dimensional  spaces. 

There  are  many  topologically  equivalent  ways  of  introducing  a  norm  in  Rm  , 

ui  Which 


(7.7) 


max 


ilxll  It.. 


m 


is  perhaps  the  simplest  from  a  computational  standpoint.  The  space  R  with 


the  norm  (7.7)  will  be  denoted  by  R 


,m 


oo 


.m 


For  a  linear  operator  (matrix)  L  =  (  \  )  in  ,  one  has 


m 


(7.  8] 


max  v  U 

|Kij 


(i)  u 


j=l 


For  the  n-linear  operator  N  defined  by  (7.1)  with  n  >  1  , 

m  m 


(7.  9) 


i  N 1 


max 

“  (0 


> 


but  equality  is  not  necessarily  attained. 

The  norm  in  Z  (R™  ,  R™)  is  thus  defined  by  (7.8)  For  a  bilinear  operator 

b=(?ijk),  one  has 


(7. 1  0) 


sup 

1x11=1 


m 

max  y 
(i) 


m 

- 

j=l  k=l 


a. .. 

jk  k 


;;om  which  it  follows  at  once  that 


(7 .  1  i  i 


max 

*  (i) 


m  nj  , 

l  l  b 

l-l  k=l 


is  (7,  9  )  for  n  =  2  .  The  general  case  may  now  be  established  by  mathe  - 


imiiic.il  induction. 
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(7.  i  2) 


bilinear  operators  B  -  { i’  ,)  in  R*  may  be  written  in  the  form 

1 

B  fA  111  P112  P\2i  P122^ 

l  l32ll  ^212  P221  0222  J 


"■ -r  the  operator 


(7.1  5) 


2-1-1  -2 


10  0  1 


tnu  estimate  (7.11)  gives 


(7.  H) 


bII  <  6 


while  direct  application  of  (7.10)  shows  that 


(7. IS) 


B  I!  =  4  . 


Consequently,  inequality  (7.9)  may  be  strict  if 


n  >  1  . 


In  actual  computation,  it  is  very  easy  to  program  a  computer  to  produce 


the  numbers 


,  max 

bk  (i) 


m  ill  z  ,  x 

_  max  y  ...  y  jfi(k) 

(7.  lt>)  b,  _  /  a  Li  u  'P44  4  4 

'  *  J1  *1  Jk=l  1  1_  2  ‘  *  * 

n  =  2,  3,  . . .  n,  given  the  multilinear  operators  in  R™  , 


(7.  17) 


\ 


(k)  } 

iJlJ2***Jk  ’ 


It  •  2,  5, . . . ,  n  .  Thus,  the  construction  of  scalar  majorant  polynomials  can  be 
.iuioji.atod.  It  is  not  essential  to  obtain  representations  of  the  multilinear 
wpiT.iLoru  entering  into  the  equation  in  the  form  (7. 17)  in  order  to  evaluate  the 
bounds  (7. 16).  If  one  is  given  the  system 


(7.  IK) 


Pi(41,e2,  •••»£„>)  =  0  »  i  -  1,  2, . . . ,  m  , 


oi  polynomial  equations  of  degree  n  in  R™  ,  then  one  may  find  the  coefficients 

of  the  terms  of  degree  k  in  the  ith  equation,  i  =  1,  2, . . . ,  m  ,  and 
ij 

j-1.2 . q  ,  ,  where  q.,  is  the  number  of  such  terms.  Then, 
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A  particular  quadratic  system  of  equations  of  importance  in  the  study  of 
matrices  is  the  characteristic  value-vector  equation.  Given  an  mXm  matrix 


“V 


(7.  ZS) 


a  solution  x  =  (4^,  •••» 


r 


m 

Z  a,.S. 

J=1  ‘f  ’ 

m  . 

S  IU2 

j=l  5 


=  o  , 

1  =  0  . 


£  ,  X)  is  sought  for  the  system 
i  -  1,  2,  *  .  •  ,  01  , 


8.  Polynomial  diffeiential  and  integral  equations.  Many  of  the  integral  and  differ¬ 
ential  equations  of  interest  in  applications  are  of  polynomial  type  [12 J.  This  is  true, 
for  example,  of  almost  all  of  the  equations  listed  in  Chapter  1  of  the  book  by 
II.  T.  Davis  [4].  Davis  also  devotes  Chapter  8  of  his  book  [4]  to  second 
order  differential  equations  of  polynomial  class.  In  the  case  of  ordinary  differ¬ 
ential  equations,  the  famous  Riccati  equation . 

+  Q(xjy  +  R(x)y2  =  S(x)  , 


(».  1) 


y(  0)  =  c  , 


is  quadratic.  If  the  coefficient  functions  Q,  R,  S  are  assumed  to  be  continuous, 

then 

(h.2)  P(y)  ■  &  +  Q(x)y  +  R(x)yZ  -  S(x) 

dx 

i.iay  be  regarded  as  an  operator  from  the  space  J'[0,  X]  of  continuously  differ¬ 
entiable  functions  on  0  <  x  <  X  into  the  space  C[0,  X]  of  continuous  real 
functions  on  the  same  interval. 

The  transformation  of  (8.1)  into  the  standard  form  (2.17)  may  be  carried 
out  very  simply.  Choose  yQ  =  yQ(x)  in  C'[0,  X]  such  that 


208 


(-■.  -.) 


v0(°) 


'»*»W  »l  (J  l 


.  *1  ) 


v{x)  >'0(x)  f  h(x)  . 


no  equation  for  the  new  unknown  function  n(x)  1 


r  R(x)h2  f  ~  t  f  Q { x )  !  )]!'i  ;• 


(a.  ) 


+  +  Q(x)  y  P(:<)V:  -  f; ( X ) }  ---  0  , 

rix  u  •!  1 


h  (  0 )  =  0  . 


Ibis  is  equivalent  to  the  Taylor  identity  (2.  25)  .  In  order  for  the  polynomial 
operator  (8.2)  to  be  regular  at  y  =  ,  one  must  be  anie  to  invert  the  linear 

differential  operator 


(«.  6) 


p,(yo)  =  +  lQ{x)  f  ^R(^y0(x)lf 


with  the  homogeneous  initial  condition.  The  inverse  of  (8.6)  is  the  linear  integral 
transform  with  kernel 


(  ■  7) 


K( x,  t)  -  exp  •  |i(  i )  -  pi  >. ) }  , 


where 


(«.  8) 


p(s)  -  J  [  Q(  u)  -  2R(  m)  y0(  u,i  I  du 


Thus,  (8.5)  is  equivalent  to  the  nonlinear  Voltaire  ir.d  nal  equation 
*  2 

(  ■ .  9 )  0  -  J  K(x,  t)  h  (t)cit  +  h{ x)  4-  g(x)  ,  a  .  r  a  X  , 

0 


w.  jure 


oy0(i) 


.  l  0)  g(x)  =  f  K(x,  t)  {—A- —  f  Q(  t)  y(i(  t)  +  R(  U  .-,“( -  ■•(  t)  )  dt 

0 


E  ju.iuon  (8.  9)  may  be  considered  to  bo  posed  10  the  bona  oh  space  [0,  X]  of 

c  r.t inuously  differentiable  functions  on  0  _  x  <  X  which  vanish  at  x  =  0  .  A 
:  wo.ible  r.t'im  in  this  space  is 


(e.  I 


max 
0,  X] 


I  f(x)  | 


If'(x)l) 


It  is  evident  from  (8.7),  (8.8),  and  (8.10)  that  (5.  2‘.)  (or  (6.14)  for  n  -  Z) 
will  be  satisfied  if  X  is  sufficiently  small. 

Higher  degree  polynomial  integral  equations  of  Volterra  type  have  been  studied 
by  Lalesco  [9].  In  the  case  of  boundary-value  problems,  a  technique  similar  to 
the  above  yields  polynomial  integral  equations  of  Fredholm  type  by  use  of  the 
appro  prune  Green's  function.  Such  algebraic  integral  equations  have  received  thr 
attention  of  E.  Schmidt  [15],  The  treatment  of  polynomial  partial  differential 
equations  may  be  carried  out  in  an  analogous  way.  For  example,  the  two- 
dimensional  Navier-Stokes  equations  (9], 


(8.  1  ,?.) 


Du 

tit 


dll  dll 

+  ub6T  +  vb7 


=  x  - 


p  fix 


+  li 


d  II 


r)X 


Y 


1  Bp 
p 


p 

p 


y 


=  Meu]  f  ill pv) 

Bt  dx  By 


fiT.ii  u  quadratic  system  in  the  incompressible  case  (  p-  constant).  Together 
with  (8.12),  one  assumes  that  the  initial  values  u(x,  y,  0)  ,  v(x,y,0)  are 

known,  and  on  the  boundary  BG  of  some  region  G  ,  v(x,  y.t.)  and  u(x,  y,  t) 
are  specified.  The  body  forces  X,  Y  and  visr-vb!  •  f.  ire  , i  -...jive  a  for  (8.12) 
a<  be  well  posed. 
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Polynomial  integral  equations  are  often  obtained  by  transformation  of  differ¬ 
ential  equations,  as  in  the  example  of  the  Riccati  equation.  They  also  arise 
directly  in  applications,  an  example  being  the  equation  of  Chandrasekhar 

(H.  13)  H(p)  -  1  +  pH(p)  f  ±M  H(p')dp* 

a 


which  occurs  in  the  mathematical  theory  of  radiative  transfer  [  3  ].  For 
H(p)  t  C[0,l]  ,  the  space  of  continuous  functions  on  0  <_  p  <_  1  f  condition 
(5.25)  will  be  satisfied  if 


(8.  14) 


max 

[0,1] 


I 


dp' 


9 


as  'I'(p)  >0  in  the  problems  considered. 


* 


2D 
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ROUNDING 


J.  M.  Yohe 

Mathematics  Research  Center 
University  of  Wisconsin 
Madison,  Wisconsin 


1 . Introduction . 


There  has  been  a  great  deal  of  work  done  in  the  areas  of  rounding, 
floating  point  arithmetic,  and  approximation  of  real  numbers  by  computer 
representable  numbers;  it  might  seem  as  though  we  were  beating  a  dead 
horse.  However,  to  our  knowledge,  no  manufacturer  yet  builds  a  computer 
which  performs  these  functions  properly  —  at  least,  not  by  our  definition. 

In  this  paper,  we  sketch  our  definition  of  "proper"  floating  point 
hardware,  indicate  how  it  can  be  implemented,  present  the  appropriate 
formttlas  for  priori  error  analysis  based  on  this  hardware  design,  and 
survey  some  of  the  basic  applications  of  this  arithmetic.  Much  of  the 
material  in  this  paper  is  treated  in  greater  detail  in  [3],  [6],  [7],  and  [8], 
The  a  priori  error  bounds  and  the  extension  of  [6]  to  radix  complement  and 
sign-magnitude  arithmetic,  while  easily  deduced  from  [6],  are  not  presented 
explicitly  elsewhere.  A  thorough  discussion  of  floating  point  arithmetic, 
including  some  of  the  ideas  presented  here,  can  be  found  in  Knutch  [1]; 
however,  he  does  not  deal  with  directed  roundings,  which  we  feel  are 
essential  to  proper  operation  of  a  computer. 

Throughout  this  paper,  we  will  assume  that  our  computer  operates  in 
the  base  B  number  system.  A  floating  point  number  is  a  pair  (E,F),  where 
E  is  an  m-diglt  signed  exponent  (power  of  B)  and  F  is  an  n-digit  signed 
fraction.  Since  the  size  and  particular  representation  of  E  have  no  bearing 
on  accuracy  apart  from  limiting  the  size  of  the  largest  and  smallest  machine 
numbers,  we  will  not  concern  ourselves  with  the  exponent  here.  In  all 
subsequent  remarks  which  deal  with  number  representations,  it  is  to  be 
understood  that  the  statements  are  true  only  within  the  range  permitted  by 
the  exponent  size. 

The  floating  point  number  (E,F)  represents  the  number 

g 

B  x  F,  where 
n 

F  =  +  }'  a.B-1,  0  a.  <  B.  (1  . 1) 

i=l 
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It  is  clear  from  (1.1)  that  any  number  which  can  be  expressed  as  an 
n-digit  base  r  fraction  times  an  m-digit  power  of  8  is  a  machine 
representable  number,  and  no  other  numbers  are  representable.  Since  m 
and  n  are  fixed,  only  finitely  many  different  real  numbers  are  representable. 
Any  non-representable  number  r  must  be  approximated  by  a  machine  number; 
if  m^  and  are  the  two  consecutive  machine  numbers  such  that 

<  r  <  m^,  and  r  is  approximated  by  either  m^  or  n^,  then  it  is  clear 

that  the  approximation  is  subject  to  an  error  of  the  order  of  8  n.  This 
is  sometimes  referred  to  as  the  basic  machine  precision. 

A  floating  point  number  is  said  to  be  normalized  if  a^  #  0;  we  will 
assume  that  all  floating  point  numbers  are  normalized,  since  maximum 
accuracy  is  maintained  by  use  of  normalized  numbers.  In  this  terminology, 
zero  is  a  special  case;  we  will  assume  that  it  is  expressed  by  a  zero 
fraction  and  the  smallest  possible  exponent;  we  will  admit  zero  as  an 
exception  to  the  rule  that  all  numbers  must  be  normalized. 

2.  Roundings  and  Lirected  Roundings; 

An  axiomatic  approach  to  computational  rounding  has  been  given  by 
Kulisch  in  [2],  For  the  sake  of  completeness,  we  sketch  some  of  the  points 
of  his  theory  here.  We  do  not  state  the  theory  in  full  generality;  those 
interested  in  further  information  along  these  lines  should  consult  [2], 

Let  (R  be  the  real  number  system,  and  let  CM  be  the  set  of  machine 
representable  numbers.  A  mapping  Q  :  tR  (M  is  said  to  be  a  rounding 
if,  for  all  a,  b  f  ®  we  have 

□  a  I  lb  whenever  a  _<  b. 

A  rounding  is  called  optimal  if  for  all  a  l  iM,  dla  *  a.  In  practice, 
this  must  be  true  for  any  reasonable  representation  of  a,  which  must 
certainly  include  any  representation  the  computer  might  manufacture  during 
an  intermediate  stage  of  an  arithmetic  operation.  The  definition  of  optimal 
rounding  implies  that  if  a  t  £R  and  m^»m2  are  consecutive  members  of  IM 

with  m^  <  a  <  m^,  and  if  □ :  [R  -*■  IM  is  an  optimal  rounding,  then  either 

!  la  =  m^  or  Qa  *  m^ . 

A  rounding  is  downward  directed  (upward  directed)  if,  for  all  a  e  CR, 
we  have  [Z]a  <  a  ( PH'a'  >  a) .  A  rounding  is  symmetric  if  dJa  =  -I  I  (-a) . 

If  □  us  a  rounding,  a,b  are  machine  numbers,  and  *  is  an  arithmetic 
operation, then  by  af^lb  we  will  mean  I  I  (a*b) . 
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By  Theorem  1  of  [2],  optimal  directed  roundings  are  unique.  We 
denote  the  optimal  upward  directed  rounding  by  A,  the  optimal  downward 
directed  rounding  by  V,  and  the  symmetric  rounding  which  takes  each  real 
number  to  the  closest  machine  number  (rounding  to  the  next  machine  number 
whose  magnitude  is  larger  if  there  is  a  tie)  by  0. 

3,  Floating  point  hardware  design. 

By  our  definition,  proper  hardware  design  would  enable  the  computer  to 
perform  any  of  the  roundings  A,  ?,  and  0  at  the  user's  option.  The 
rounding  o  is  most  frequently  used,  since  it  produces  maximal  accuracy. 
However,  the  roundings  A  and  V  are  used  in  implementation  of  interval 
arithmetic,  for  example,  and  in  certain  other  situations;  we  will  discuss 
applications  briefly  in  Section  5.  We  will  refer  to  floating  point  arithmetic 
which  provides  for  all  three  of  these  roundings  as  "Best  Possible"  floating 
point  arithmetic.  The  following  is  a  brief  sketch  of  the  theory  presented 
in  [6]. 


What  information  does  our  computer  need  in  order  to  round  a  real  number 
properly?  It  clearly  needs  the  first  n  digits  of  the  appropriate  base  t 
fraction.  Moreover,  in  order  to  be  able  to  round  to  the  nearest  machine 
number  (by  our  definition  of  such  rounding)  it  needs  the  n+l££-  digit  of 
the  fraction.  Finally,  in  order  to  obtain  a  correct  upward  or  downward 
directed  rounding,  it  needs  an  indicator  to  tell  us  whether  there  are  any 
nonzero  digits  in  the  remainder  of  the  fraction. 


The  result  of  an  arithmetic  operations  combining  two  machine  numbers 
is  not,  in  general,  a  machine  number;  we  must  usually  approximate  the 
answer.  In  order  to  assure  ourselves  that  our  computer  has  all  of  the 
above  information  at  the  conclusion  of  an  arithmetic  operation,  we  must 
design  it  to  preserve  even  more  information  during  the  execution  of  the 
operation.  We  will  illustrate  this  by  means  of  a  floating  point  decimal 
representation  which  uses  a  three  digit  fraction  and  sign-magnitude  repre¬ 
sentation  for  negative  numbers.  We  will  confine  our  discussion  to  addition, 

.1  t  ^  n  1  {  ^  \  ■  -  *-•  4  *!*'**  '  ’  •Nr' 


f,  i.n*  • 


.  i 


We  will  also  need  two  guard  digits  at  the  right-hand  end  of  the 
register  to  preserve  information  which  is  shifted  out  of  the  right-hand 
end  of  the  three-digit  fraction.  These  guard  digits  are  appended  to  the 
three  fractional  digits  to  form  a  five  digit  fraction;  all  five  digits 
participate  fully  in  the  addition.  The  initial  value  of  the  guard  digits 
is,  of  course,  zero;  in  (8-1)' s  complement  arithmetic,  zero  is  expressed 
as  zero  if  the  number  is  positive  and  as  (8-1)  if  the  number  is  negative. 
The  need  for  one  guard  digit  is  self-evident;  the  need  for  two  is  illus¬ 
trated  by  the  following  problem: 

.100  x  10° 

-.995  x  10'2 

This  should  be  computed  as  follows  in  sign-magnitude  arithmetic: 

.10000 

-.00995 

=.09005 

Normalization  now  yields  .90050  x  10  the  former  second  guard 
digit  is  now  the  first  guard  digit,  and  is  necessary  for  proper  rounding. 

One  might  wonder  whether  an  unlimited  number  of  guard  digits  would  be 
necessary.  The  following  theorem  shows  that  two  guard  digits  are  always 
sufficient  to  preserve  maximal  accuracy: 

Theorem  3,1:  If  more  than  one  position  of  left  shift  is  required  to 
normalize  the  result  of  an  addition,  then  at  most  one  position  of  right 
shift  was  required  to  equalize  the  exponents. 

The  proof  of  Theorem  3.1  is  given  in  [6]. 

The  two  guard  digits  are  denoted  by  GG  in  Figure  1 

The  final  item  of  information  needed  is  an  indicator  to  show  whether 
any  nonzero  digits  were  shifted  off  during  equalization  of  the  exponents. 
This  indicator  can  be  a  single  binary  digit,  and  is  denoted  by  I  in  Figure 

1. 
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Figure  1,  The  structure  of  the  accumulator 


Explicit  algorithms  for  Best  Possible  floating  point  arithmetic  are 
presented  in  [6],  These  algorithms  are  stated  for  (6-1) ?s  complement 
arithmetic,  but  they  are,  in  fact,  perfectly  general.  In  order  to  use 
them  for  sign-magnitude  or  6's  complement  arithmetic  we  need  only  set  the 
guard  digits  to  zero  upon  loading  a  number  into  the  C.P.U.,  regardless  of 
the  sign  of  the  number.  A  person  wanting  to  implement  Best  Possible 
floating  point  arithmetic  on  a  sign-magnitude  or  6's  complement  machine 
might  think  that  modification  of  the  algorithms  would  be  necessary, 
because  the  proof  of  the  algorithm  is  given  for  (6-1) *s  complement  arith¬ 
metic,  We  sketch  the  reasoning  for  the  other  forms  of  arithmetic  below. 

The  algorithm  in  [6]  requires  that  the  number  with  the  smaller  magni¬ 
tude  be  made  positive  before  addition  is  begun.  This  number  is  then 
placed  in  the  accumulator  and  shifted  right  the  requisite  number  of  places 
to  equalize  the  exponents;  if  any  nonzero  digits  are  shifted  out  of  the 
low-order  guard  digit,  I  is  set  to  1. 

The  sum  is  now  formed;  it  is  negative  unless  it  is  identically  zero. 
However,  if  I  is  nonzero,  it  represents  a  positive  correction,  since  the 
number  which  caused  I  to  be  set  to  1  was  positive.  Thus  the  correct  result 
lies  between  the  number  we  have  generated  and  the  next  larger  number 
representable  on  the  machine.  The  next  larger  number  is  clearly  the 
negative  number  whose  magnitude  is  next  smaller  than  the  magnitude  of  the 
number  we  have  generated. 

In  order  to  proceed  with  the  algorithm,  we  need  to  obtain  a  lower 
bound  for  the  magnitude  of  the  result;  the  magnitude  of  what  we  have 
obtained  is  an  upper  bound.  Consequently,  if  I  is  nonzero,  indicating 
that  the  lower  and  upper  bounds  are  not  the  same,  we  add  one  into  the 
low  order  digit  position,  which  decreases  the  magnitude  of  the  number, 
and  then  make  it  positive;  we  can  now  proceed  with  the  algorithm.  (of 
course,  we  record  all  of  these  sign  changes  so  we  can  apply  any  necessary 
correction  at  the  completion  of  the  operation). 

As  an  example  of  this,  let  us  consider  the  following  problem; 

.100  x  10° 

-.990  x  10"4 

In  9's  complement  arithmetic,  the  computation  is  as  follows  (the 
indicator  I  is  shown  as  the  1  two  spaces  to  the  right  of  the  low-order 
guard  digit) . 


217 


.00009 

1 

-  +.89999 

=-  .90008 

1 

1 

.90009 

1 

+  .09990 

1 

In  10* s  complement 

arithmetic 

Form  sum 

Add  1  to  result 

Complement 

the  computation  looks  like  this: 


.00009  1 

-  +.90000 

-  790009  1 

1 

.90010  1 

.09990  1 


Form  sum 

Add  1  to  result 

Complement 


In  sign-magnitude  arithmetic,  we  have 


.00009  1 

-.10000 

=-  -.09991  1 

1 

-  -.09990  1 

.09990  1 


Form  sum 

Subtract  1  from  magnitude 
(i.e.,  add  1  to  result) 
Negate 


The  results  of  the  three  types  of  arithmetic  are  identical  —  which 
they  should  be,  since  positive  numbers  are  expressed  the  same  way  no 
matter  which  type  of  arithmetic  we  are  using.  The  floor  and  ceiling  values 
for  the  sum  are,  respectively,  .999  x  10“1  and  ,100  x  10®,  which  is 
exactly  what  we  would  expect  from  any  of  the  three  methods.  Hence  Best 
Possible  floating  point  arithmetic  could  be  implemented  on  any  hardware, 
regardless  of  the  arithmetic  scheme  used. 


Although  we  have  avoided  any  mention  of  exponent  overflow  and  under¬ 
flow  conditions,  proper  hardware  design  should  include  proper  handling  of 
out-of-range  numbers.  This  includes  an  interrupt  upon  occurrence  of  the 
error  condition,  together  with  a  complete  set  of  indicators  to  tell  the 
user  exactly  what  went  wrong.  Details  of  such  a  scheme  are  given  in  [6], 

One  further  word  about  hardware  is  appropriate:  if  the  machine 
operates  in  the  base  6  ^  10,  then  the  hardware  ought  to  provide  facilities 
for  conversion  between  base  B  and  10,  If  the  hardware  is  designed  to  do 
arithmetic  operations  and  rounding  in  the  manner  described  here,  accurate 
conversions  —  at  least  from  base  10  to  base  B  —  should  also  be  relatively 
easy  to  incorporate.  This  is  discussed  in  detail  in  [8],  and  we  will 
explore  it  no  further  here. 
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4_. _ A  priori  error  analysis: 


The  standard  reference  work  on  a_  priori  error  estimates  for  floating 
point  arithmetic  is  Wilkinson's  1963  paper  [5],  For  multiplication  and 
division,  Wilkinson's  error  estimates  are  as  natural  as  can  be  expected; 
however,  in  the  case  of  addition  (and  subtraction)  the  capriciousness  of 
computer  designers  made  it  necessary  to  produce  a  rather  unnatural  and 
somewhat  intractable  error  estimate  in  order  to  reflect  the  realities  of 
the  situation.  Here,  we  will  see  that  Best  Possible  floating  point 
arithmetic  yields  a  more  natural  and  more  tractable  error  bound  for 
addition  than  can  be  hoped  for  if  the  computer  produces  less  than  optimal 
accuracy. 

Throughout  this  section,  as  in  the  rest  of  the  paper,  we  assume  that 
neither  overflow  nor  underflow  occurs  during  arithmetic  operations.  Over¬ 
flow  is  almost  invariably  a  fatal  (to  the  computation)  error;  underflow 
can  often  be  tolerated,  and  calculation  can  proceed  with  a  zero  replacing 
the  underflowed  quantity.  (All  the  same,  underflow  should  at  least 
optionally  trigger  an  error  indicator  or  interrupt  so  that  it  can  be 
detected;  the  absence  of  automatic  error  detection  on  underflow  can  lead 
to  failure  to  recognize  invalid  computational  results!)  Replacing  an 
undersized  result  with  zero  complicates  the  error  analysis;  this  has  been 
considered  by  Schoenfeld  in  [4]. 

If  *  is  any  of  the  four  arithmetic  operations  in  the  real  field,  then  by 
*  we  will  denote  the  machine  approximation  to  *.  The  constants  p,  6, 
and  a  which  appear  in  the  following  formulas  are  determined  by  the  hardware 
design,  but  are  essentially  of  the  order  of  B~n.  In  each  formula,  0  is 
a  constant  in  the  range  -1  j-  9  <1  which  depends  on  the  operation  and  the 
operands;  to  keep  notation  uncluttered,  we  will  not  reflect  this  dependency. 

Wilkinson's  a  priori  formulas  are  as  follows: 


x  •  My  3  (x*y)  (1+eu) 

(4.1) 

x  *  My  =  (x*y)  (l+y6) 

(4.2) 

x  +My  =  x(l+0a)  +  y(l+e’a) 

(4.3) 

formulas  imply  that 

|x  •  My  -  x.y|  jc  jx.ylu 

(4.4) 

|xTMy  -  xvy|  <.  |x4y[4 

(4.5 

ix  V  -  x+yj  <  a(|x!  +  |y( ) 

(4.6) 
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and 


Typical  values  for  the  constants  u ,  c,  and  a  are 
2?.  n,  depending  on  the  design  of  the  hardware. 


1-n 

>'  » 


If  we  are  using  Best  Possible  arithmetic,  however,  we  can  regard  our 
machine  M  as  being  three  machines,  known  as  0,  A,  V,  which  perform  the 
roundings  0,  A,  and  V  (described  in  Section  2)  respectively.  For  each 
of  these  machines,  we  have  x  *^y  =  x  0y  whenever  *  is  one  of  the 

four  arithmetic  operations.  Moreover,  we  can  replace  (4.3)  with  the  form¬ 
ula 


x  [±]  y  =  (x  +  y)  (1  +  Pa)  (4.3') 

which  implies  that 

|x  EF)  y  -  x  +  yj  |x  +  y  j  a  (4.6') 


These  estimates  are  clearly  more  natural  and  perhaps  more  aesthetically 
pleasing  than  (4.3)  and  (4.6).  Moreovei,  in  the  case  of  rounding,  we  have 


u 


while  in  the  cases  of  upper  and  lower  bounds, 


.  -n 

u  -  r>  =  ,i  =  r  . 

Explicitly,  we 
x  ®  y  = 
x^  y  = 
X^7y  = 

which  implies  that 

i*  *  y\ 

x  *  y 
x  *  y  - 


have 

(x  *  y)  (1  +  u 8  n)  ,  -  ~  j;  •' 

x.  *  y  +  o|x*  y|  3  n  >  0  _<•  0  <  1 
,  ,  -n 

x  *  y  -  H|x  *  yjo  5  0  j-  ■»  <  1 

(1  "  7'~n)  ^x{*)y  |x  *  yj  (1  +  -  (I-0) 
^x^y  <x*y+|x*y|  6  n 
|x  *  yj  K~n  x  ^7y  <_  x  *  y 


(4.7) 

(4.8) 

(4.9) 

(4.10) 

(4.11) 

(4.12) 


Moreover,  it  should  be  noted  that  whenever  the  result  of  any  operation 
is  a  machine  representable  number,  then  the  result  of  the  operation  is 
exact.  This,  unfortunately,  is  not  always  the  case  with  present  day 
hardware . 
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As  an  extreme  example,  let  us  consider  the  case  of  a  machine  with  a 
ten  digit  fraction,  operating  in  the  decimal  number  system;  let  us  assume 

that  >  -  ~  10  Then,  if  we  have  the  following  addition 

.1000000000  x  101 
-.9999999999  x  10° 

.0000000001  x  10°  =  .1000000000  x  10~9 

the  error  bound  obtained  from  (4.6)  would  tell  us  that  the  result  is 
-9  -10 

.1000000000  x  10  +  .99999999995  x  10  ,  or,  essentially,  that  we  have 

no  significance  left.  However,  (4.6')  says  that  the  result  is  .1000000000  x 
-9  -19 

10  +  .1000000000  x  10  ,  which  is  a  far  more  optimistic  bound  on  this 

particular  addition! 

It  can  be  argued  that  these  summands  are  probably  inaccurate  in  the 
last  decimal  place,  so  that  the  Wilkinson  bound  is  more  realistic.  Perhaps 
this  is  the  case;  however,  that  decision  should  be  left  to  appropriate 
error  analysis  on  the  summands.  The  important  fact  here  is  that,  using 
Eest  Possible  arithmetic,  the  result  of  the  above  problem  is  computed 
exactly,  and  consequently,  the  less  uncertainty  the  bound  reflects,  the 
better  it  is. 

The  necessity  for  such  a  pessimistic  bound  as  that  in  (4.6)  can  be 
seen  from  looking  at  the  above  example  as  It  might  be  computed  by  a 
computer  using  typical  present-day  design: 

.1000000000  x  101 

-.09999999999  x  loj 
.0000000001  x  10  =  .1000000000  x  10 

-9 

Here,  of  course,  we  would  have  ■*  ~  .100000000  x  10  since  the  machine 
truncates  before  normalizing,  and  consequently  (4,6)  yields  an  estimate 

of  .1000000000  x  10  9  +  .19999999999  x  10  this  is  not  unduly  pessi¬ 
mistic.  Of  course,  (4.6')  does  not  apply  to  hardware  of  this  design. 

5.  Applications 

We  will  mention  a  few  of  the  applications  of  Best  Possible  floating 
point  arithmetic.  The  rounding  0  has  applications  in  almost  every  com¬ 
putation  using  floating  point  arithmetic.  It  is  this  rounding  that  we 
expect  to  get,  and  (usually  erroneously)  assume  we  do  get,  from  a  piece 
of  equipment  costing  several  million  dollars. 
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The  roundings  A  and  while  not  provided  with  any  production 
computer  we  know  of,  also  have  important  applications. 

Perhaps  the  most  obvious  application  of  these  roundings  is  in  the 
implementation  of  interval  arithmetic.  Hardware  designed  to  produce 
these  roundings  would  render  the  programming  of  an  interval  arithmetic 
package  nearly  trivial,  and  would  enable  interval  operations  to  be  exe¬ 
cuted  in  one  tenth  to  one  fifth  of  the  time  normally  required  to  execute 
them  with  simulated  floating  point  arithmetic  (simulation  is  usuall-' 
necessary  if  we  are  to  he  able  to  produce  the  tightest  possible  bounds)  . 
The  formula  for  addition  of  two  intervals,  for  example,  is 

[ a , b ]  +  [c,d]  =  [a  +  c,  b  +  d]  . 

If  we  assume  that  a,b,c,  and  d  are  machine  representable  numbers  and 
denote  the  computer  approximation  to  [a,b]  +  [c,d]  by  [a,b]  ^  [c,d], 
then  the  above  formula  translates  as  follows: 

[a,b  ]  ^  [c,d]  =  [a  ^  c,  b  ^  d]. 

Evaluation  of  this  formula  on  a  computer  equipped  with  directed  rounding 
takes  just  twice  as  long  as  evaluating  the  sum  of  two  floating  point 
numbers . 

An  interval  arithmetic  package  for  the  UNIVAC  1108,  using  simulated 
floating  point  arithmetic  as  described  in  [6],  is  detailed  in  (3], 

Another  consequence  of  directed  roundings  is  that  upper  and  lower 
bounds  for  sequences  of  machine  operations  are  much  more  easily  and 
accurately  computed,  both  a_  pr iori  and  during  computation,  than  is  possible 
with  conventional  rounding.  This  enables  one  to  combine  a  priori  analysis 
with  computational  considerations  to  produce  rigorous  bounds  for  relative 
error  in  evaluation  of  mathematical  functions.  In  [7],  it  is  proved  that 
on  a  binary  computer  with  optimal  directed  rounding,  the  square  root  of  a 
machine  representable  number  can  be  calculated  exactly  if  it  is  machine 
representable,  and  bracketed  by  two  consecutive  machine  numbers  if  it  is 
not  machine  representable;  this  is  accomplished  without  using  interval 
arithmetic. 

Perhaps  the  main  point  of  this  paper  can  be  summed  up  very  simply: 
floating  point  arithmetic  can  be  this  good,  and  it  can  be  no  better. 

Users  of  computing  equipment  should  not  have  to  settle  tor  less. 


222 


REFERENCES 


1.  Donald  E,  Knuth,  The  Art  of  Computer  Programming,  Volume  2,  Semi- 
numerical  Algorithms,  Addison-Wesley  Publishing  Co.,  Reading,  Mass.  1969. 

2.  U.  Kulisch,  An  axiomatic  approach  to  rounded  computation,  Technical 
Summary  Report  #1020,  Mathematics  Research  Center,  University  of  Wisconsin, 
November,  1969. 

3.  T.  I),  Ladner  and  J.  M.  Yohe,  An  interval  arithmetic  package  for  the 
UNIVAC  1108,  Technical  Summary  Report  #1055,  Mathematics  Research  Center, 
University  of  Wisconsin,  May,  1970. 

4.  Lowell  Schoenfeld,  Floating  Point  Error  Estimates,  Chapter  IX  of 
Technical  Summary  Report  #721,  Mathematics  Research  Center,  University  of 
Wisconsin,  August,  1967. 

5.  J.  H.  Wilkinson,  Rounding  Errors  in  Algebraic  Processes,  National 
Physical  Laboratory  Notes  on  Applied  Science,  No.  32,  Her  Majesty's 
Stationery  Office,  London,  1963. 

6.  J.  M.  Yohe,  Best  Possible  Floating  Point  Arithmetic,  Technical  Summary 
Report  #1054,  Mathematics  Research  Center,  University  of  Wisconsin, 

March,  1970. 

7.  _ ,  Rigorous  Bounds  on  Computed  Approximations  to  Square  Roots 

and  Cube  Roots,  Technical  Summary  Report  #1088,  Mathematics  Research 
Center,  University  of  Wisconsin,  September,  1970, 

8.  _ _ ,  Accurate  Conversion  Between  Number  Bases,  Technical 

Summary  Report  #1109,  Mathematics  Research  Center,  University  of 
Wisconsin,  October  1970. 


-1 


A 


223 


'.TIMER  ICAL  SPECTRA  AND 
APPLICATIONS  TO  COMPUTATIONAL  METHODS 


M.  Z.  Nashed  and  K.  Orlov 
Mathematics  Research  Center 
University  of  Wisconsin 
Madison,  Wisconsin 

ABSTRACT .  This  paper  develops  a  new  technique  for  computations 
based  on  the  theory  of  numerical  spectra  which  results  in  a  considerable 
reduction  in  the  required  arithmetic  operations  and  over-all  computing 
time.  The  essence  of  this  spectral  method  is  an  arithmetization  of  the 
elements  and  algebraic  operations  occurring  in  a  computational  problem. 

The  method  consists  of  forming  from  each  element  entering  the  computation 
(for  instance,  a  polynomial  of  an  algebraic  equation,  a  matrix,  an 
interval,  etc)  one  number  called  the  spectrum  of  that  element.  Calcu¬ 
lations  with  such  numbers-spectra  are  done  in  the  same  way  as  with 
ordinary  numbers.  Each  of  the  spectra  occupies  one  cell  of  the  digital 
computer.  After  all  the  necessary  operations  have  been  performed  on  the 
spectra,  the  resulting  spectrum  gives  the  solution  of  the  problem  in 
spectral  form.  By  applying  the  same  rules  as  in  forming  the  spectrum, 
but  inversely,  one  obtains  a  sequence  of  numbers  which  gives  the  solution. 

The  purpose  of  this  paper  is  two-fold:  First,  a  systematic  and 
unifying  theory  of  spectra  and  pseudospectra  is  presented,  with  special 
attention  being  given  to  internal  operations  within  a  given  spectrum  and 
binary  arithmetic  operations  on  spectra.  Second,  applications  of  numerical 
spectral  analysis  are  given  to  computations  with  recursive  relations, 
difference  and  differential  equations,  interval  arithmetic,  solution  of 
polynomial  equations  by  Graeffe's  and  Bernoulli's  methods,  and  some 
computational  methods  of  linear  algebra. 

1.  INTRODUCTION.  The  necessity  to  deal  with  complex  problems  in 
mathematics  had  led  to  the  introduction  of  mathematical  entities  that 
are  more  complicated  than  the  real  numbers.  Examples  of  such  entities 
include  vectors,  matrices,  tensors,  polynomials,  etc.  They  are  com¬ 
posed  of,  or  related  to  ,  numbers  in  a  certain  way  but  they  are  not 
more  numbers.  Operations  with  such  entities  are  more  complicated  than 
arithmetic  operations  with  numbers.  In  an  effort  to  simplify  and  to 
perform  automatically  such  operations,  various  arithmetics  and  routines 
have  been  developed.  In  the  present  paper,  we  develop  a  theory  of 
numerical  spectra,  to  be  defined  in  Section  2,  and  apply  it  to  compu¬ 
tational  arithmetic  and  numerical  methods.  At  the  outset,  we  should 
like  to  point  out  that  the  terms  "spectrum"  and  "spectral  analysis" 
as  used  in  this  paper  have  no  connection  with  their  connotations  in 
functional  analysis  fy]. 


Preceding  page  blank 


The  notation  ol‘  a  mathematical  spectrum  is  due  to  Petrovitch 
[JO],  The  idea  of  the  simplest  numerica1  spectrum  is  to  establish  a 
certain  one-to-one  correspondence  between  a  set  of  elements  and  a  set 
of  positive  integers,  methods  was  made  by  Orloff  [14],  [15], 

In  Section  2,  the  basic  theory  of  numerical  spectra  is  presented, 
with  particular  attention  being  given  to  internal  operations  within  a 
spectrum,  binary  arithmetic  operations  with  spectra,  and  operations  of 
choice.  The  formation  of  spectra  of  various  elements  occurring  in 
numerical  problems  are  also  discussed.  In  Section  8,  pseudospectra 
are  introduced  and  applied  to  solving  initial-value  problems  of  differ¬ 
ential  equations.  In  Sections  3-7,  new  applications  of  spectra  are 
given  to  arithmetic,  polynomials,  recursive  calculations  and  difference 
equations,  interval  arithmetic,  and  solutions  of  polynomial  equations. 

The  applications  of  spectra  to  Graeffe’s  and  Bernoulli’s  methods  of 
solving  polynomial  equations  and  to  computational  methods  of  linear 
algebra  are  refinement  of  earlier  results  [14],  [17],  [18],  [10], 

Examples  accompany  most  of  the  spectral  techniques  introduced  in 
the  present  paper.  These  examples  demonstrate  the  computational  pro¬ 
cedures  in  the  context  of  spectral  analysis,  and  show  the  reduction  in 
the  number  of  operations,  the  simplicity  and  over-all  reduction  of  time 
in  comparison  with  ordinary  methods.  For  these  advantages  it  is  felt 
that  the  spectral  method  should  be  of  considerable  interest  for  computers. 
The  spectral  method  may  also  play  a  role  in  further  developments  of 
digital  computers. 

2.  BASIC  THEORY  OF  NUMERICAL  SPECTRA. 

2.1  THE  SPECTRA  OF  A  SEQUENCE  OF  POSITIVE  INTEGERS.  Consider  the 
finite  sequence  of  positive  integers 

5,  13,  8,  28  (2.1) 

These  integers  have  different  numbers  of  digits.  Before  forming  the 
spectrum  of  this  sequence,  these  numbers  must  be  transformed  into  so- 
called  spectral  numbers  (spectral  integers),  that  is,  integers  having 
the  same  number  of  digits.  The  necessity  to  preserve  the  value  of  each 
number  of  the  sequence  (2.1),  and  at  the  same  time  to  transform  them 
into  spectral  numbers  leads  to  the  completion  of  such  numbers  by  zeros 
on  their  left.  Thus  the  sequences 

05,  13,  08,  28 
005,013,008,028 

are  sequences  of  spectral  integers.  The  numbers  of  digits  in  every 
spectral  integer  in  the  same  sequence  is  called  the  uniform  spectral 
rhythm  or  briefly  the  rhythm  h.  The  rhythm  of  the  former  sequence  is 
two  while  the  rhythm  of  the  latter  is  three. 

The  ordinary  spectrum  or  briefly  the  spe ctrun  of  a  sequence  cf 
positive  integers  is  the  number  obtained  by  writing  the  spectral  numbers 
consecutively.  For  example,  the  numbers  =  05130828  and  =  005013008028 
are  two  spectra  of  the  same  sequence  (2.1),  with  rhythms  2  and  3 
respectively. 
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The  spectrum  can  be  partitioned  into  sections,  each  section  con¬ 
taining  only  one  spectral  number.  Thus  the  spectra  and  are 
written  J 

S1  =  05 1 13  j  08  j  28,  S2  =  005 1 013 j 008 j 028 

The  sections  are  enumerated  from  left  to  right.  The  zeros  on  the  left 
of  the  spectrum  are  ordinarily  omitted.  Thus  the  length  of  each  section 
of  the  spectrum  is  the  same  with  the  possible  exception  of  the  first 
section  which  can  have  fewer  digits.  Note  that  the  rhythm  h  =  1  is 
not  compatible  with  the  sequence  (2.1).  The  individuality  of  each  of 
the  numbers  in  (2.1)  definitely  is  not  lost  and  can  be  restored  if 
necessary  by  the  operation  of  cutting  the  spectrum. 

The  ordinary  spectrum  is  not  the  only  spectrum  used  in  computation. 
Another  kind  of  spectrum,  called  inverse  spectrum,  is  obtained  in  the 
following  way.  The  numbers  of  the  sequence  are  arranged  in  opposite 
order  and  afterwards  the  ordinary  spectrum  is  formed.  For  example, 
the  inverse  spectrum  of  the  sequence  (2.1)  with  rhythm  h  =  2  is  the 

number  Z  =  28081305.  The  unit  spectrum  is  defined  to  be  the  spectrum 
composed  of  a  spectral  number  one  in  each  of  the  sections.  For  example, 
the  unit  spectrum  of  four  terms  with  h  =  2  is  1010101, 

2.2  THE  SPECTRUM  OF  A  SEQUENCE  OF  INTEGERS  OF  DIFFERENT  SIGNS. 

A  sequence  of  integers  with  different  signs,  for  instance  the  sequence 

12,  -27,  48,  8,  -13  (2.2) 

can  be  decomposed  into  two  sequences.  The  first  is  composed  from  the 
positive  terms  of  the  sequence  (2.2)  and  zeros  in  the  places  of  the 
negative  numbers,  that  is, 

12,  0,  48,  8,  0  (2. 2') 

The  second  sequence  is  formed  from  the  absolute  values  of  the  negative 
terms  with  zeros  replacing  the  positive  terms  in  the  sequence,  that  is, 

0,  27,  0,  0,  13  (2.2") 

Let  S+  and  S  denote  the  spectra  of  the  sequences  (2 .2')  and  (2.2") 
with  a  compatible  common  rhythm  h.  S+  and  S  are  called  respectively 
the  positive  and  negative  spectrum  of  the  sequence  (2.2).  The  difference 
S  =  S+”  -  S>  is  defined  to  be  the  spectrum  of  the  sequence  (2,2)  of 
positive  and  negative  integers.  The  compatible  rhythm  h  with  a  sequence 
of  positive  and  negative  integers  must  satisfy  the  following  inequality 
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where  the  a^'s  are  the  terms  of  the  sequence.  In  the  example,  the 

lowest  compatible  rhythm  is  h  =  2  and  the  spectrum  formed  with  this 
rhythm  is 

S  =  11 1  73 1  48 1  07 1  87  (2.3) 

The  sections  of  the  spectrum  are  called  big  sections  if  they  begin  with 
big  digits  (5  -  9)  and  small  sections  if  they  begin  with  small  digits 
(0  -  4).  The  beginning  digit  itself  is  called  the  characteristic 
digit  of  the  section. 

Now  it  is  necessary  to  give  the  solution  of  the  inverse  problem 
of  the  forming  the  spectrum,  i.e.,  to  recover  the  sequence  from  the 
spectrum.  For  this  purpose  we  introduce  the  notations  of  nominal, 
corrected  and  effective  value  of  a  section  of  a  spectrum.  The  nominal 
value  is  just  the  number  written  in  this  section  of  the  spectrum.  The 
corrected  value  of  a  section  is  equal  to  its  nominal  value  if  the 
characteristic  digit  of  the  following  section  is  a  small  digit  and  is 
one  greater  than  the  nominal  value  if  this  characteristic  digit  is  a 
big  one.  The  corrected  value  of  the  last  section  is  the  same  as  the 
nominal  value.  The  effective  value  of  a  small  section  is  equal  to  the 
corrected  value  of  this  section;  the  effective  value  of  a  big  section 
is  a  negative  number  which  is  the  difference  between  the  corrected 
value  of  this  section  and  10*1.  It  follows  readily  that  the  effective 
values  of  the  sections  are  the  terms  of  the  sequence.  In  the  previous 
example,  the  nominal  values  of  the  sections  are 

11,  73,  48,  07,  87, 

the  corrected  values  are 

12,  73,  48,  08,  87, 

and  the  effective  values  are 

12,  -27,  48,  8,  -13. 

If  a  spectrum  has  an  odd  number  of  sections, the  effective  value  of 
the  middle  section  is  called  the  middle  part  of  the  spectrum  and  is 
denoted  by  M(S).  In  the  case  of  a  spectrum  with  even  numbers  of 
sections,  we  have  two  middle  sections  denoted  by  M^(S),  i  =  1,2. 
Obviously  it  is  possible  to  form  an  inverse  spectrum  of  a  sequence 
composed  from  positive  and  negative  integers.  The  inverse  spectrum  of 
the  sequence  (2.2)  with  the  rhythm  h  =  2  is 

l  =  — 1 2 | 91 | 52 | 26 | 88  (2.4) 
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The  ordinary  (inverse)  spectrum  is  either  a  positive  or  a  negative 
number.  In  order  to  recover  the  terms  of  the  sequence  from  a 
spectrum  which  is  a  negative  number,  we  first  obtain  the  effective 
values  of  the  sections  of  the  absolute  value  of  the  spectrum  and 
then  change  the  sign  of  every  effective  value, 

2.3.  OPERATIONS  WITH  SPECTRA.  Spectra  are  numbers.  It  is 
possible  therefore  to  perform  the  usual  arithmetical  operations  on 
spectra.  But  there  are  internal  operations  which  are  applicable  only 
to  the  spectrum  such  as  the  operations  of  cutting  of  the  spectrum 
into  sections,  mentioned  earlier.  Another  cutting  of  the  spectrum 
into  parts,  which  are  not  the  sections,  is  ordinarily  used  in  the  case 
when  the  spectrum  has  too  many  digits,  so  it  cannot  enter  into  the 
computer  as  a  whole.  In  this  case  the  length  of  the  parts  is  deter¬ 
mined  by  the  capacity  of  the  computer  or  desk  calculator.  Such  cutting 
is  a  purely  operational  one  without  other  significance. 

Another  operation  is  the  transposition  of  the  spectrum  to  another 
rhythm.  A  transposition  to  a  greater  rhythm  is  called  a  dilution 
of  the  spectrum  and  the  new  spectrum  is  called  a  diluted  spectrum. 

The  dilution  of  a  spectrum  from  a  rhythm  h  to  a  rhythm  H  >  h  is  obtained 
by  putting  H  -  h  digits  of  9  on  the  left  of  each  big  section  and  H  -  h 
digits  of  0  on  the  left  of  each  small  section,  with  the  exception  of 
the  first  section  which  is  left  unchanged.  For  example  the  dilution 
of  the  spectrum  (2.3)  to  the  rhythm  H  =  4  gives  the  spectrum 

11 1  9973  j  Ul)48|  0007 1  9987  (2.5) 

The  dilution  of  the  inverse  spectrum  (2.4)  to  H  =  3  gives  the  spectrum 

Z  =  —12 1 99 1 j  952 | 026) 988. 

The  transposition  from  a  rhythm  H  to  a  smaller  rhythm  h  is  called 
condensation.  If  a  spectrum  with  rhythm  H  has  the  property  that 
every  section  of  the  spectrum  begins  with  H  -  k  zeros  or  H  -  k  digits 
of  9,  then  this  spectrum  can  be  condensed  to  a  smaller  rhythm  h  =  H  -  j , 
j  =  l,2,,.,,k-l.  It  can  be  condensed  to  the  rhythm  h  =  H  -  k  only  if 
the  (k  +  l)st  digit  in  each  section  is  of  the  same  kind  (big  or  small) 
as  the  characteristic  digit  of  this  section.  The  condensation  is 
effected  by  the  deletion  of  the  first  j  digits  of  each  section.  For 
example  the  spectrum 

11| 9943 | 0007 | 9987 

can  be  condensed  to  the  rhythm  h  =  3,  but  not  to  the  rhythm  h  =  2, 
whereas  the  spectrum 


229 


11 j  9973 } 00071 9987 

can  be  condensed  to  the  rhythm  h  =  2. 

The  operation  of  elongation  is  applied  directly  to  a  sequence 
and  consists  of  adjoining  to  the  sequence  in  certain  places  new  terms 
of  zero.  For  example,  the  sequence  (2.2)  elongated  on  the  second, 
fourth  and  fifth  terms  will  be 

12,  0,  -27,  0,  0,  48,  8,  -13. 

Sometimes  for  certain  purposes  it  is  necessary  to  form  a  spectrum 
which  is  slightly  different  from  the  ordinary  (or inverse)  spectrum. 

Such  kind  of  spectrum  is  called  corrected  spectrum  and  is  denoted  by  S. 

The  most  useful  corrected  spectrum  is  the  spectrum  obtained  by  sub¬ 
tracting  from  certain  sections  of  the  original  spectrum  twice  their 
effective  values.  For  example  we  will  use  the  spectrum  corrected  in 
the  last  section  or  the  spectrum  corrected  in  all  the  even  sections. 

The  operation  of  ablation  of  the  last  r  sections  in  a  spectrum  of 
rhythm  h  consists  of  rounding  off  (in  ordinary  way)  this  spectrum  by 
rh  digits  and  then  the  deletion  of  the  last  rh  zeros.  Note  that  to 
the  ablated  spectrum  corresponds  the  same  sequence  without  its  last 
r  terms.  The  ablated  spectrum  is  denoted  by  S.  For  example,  the 
ablation  of  the  last  section  of  the  spectrum  (2.3)  leads  to 

S  =  11 1 43  j  48 1  08 

Another  important  operation  with  spectra  is  the  internal  or  formal 
rounding  off.  This  operation  consists  of  applying  the  ordinary  rounding 
off,  but  only  to  the  big  sections.  The  operation  is  carried  out  beginning 
from  right  to  left;  the  big  sections  are  replaced  by  zeros  and  each 
spectral  number  on  the  Jeft  of  a  big  section  is  increased  by  one.  For 
example,  the  spectrum  (2,3),  formally  rounded  off  will  be 

12|00|48j08i00 

It  is  obvious  that  in  this  way  we  obtain  the  positive  spectrum  S+  of 
the  original  sequence  (2.2).  We  give  another  example  of  formal  rounding 
off.  The  spectrum  S  =  0| 9986 J 9981 | 9979 | 0041 | 9999 | 0001  belongs  to  the 
sequence  1,  -13,  -18,  -21,  42,  -1,  1  for  h  =  4,  and  the  formally  rounded 
off  spectrum  is  l) 0000 j  0000} 0000| 0042  j  0000} 0001. 

Finally  we  shall  mention  the  operation  of  choice  .  Examples  of 
such  operations  were  described  earlier  by  obtaining  the  positive  and 
negative  spectra  S+  and  S“  of  the  sequence  (2.2),  where  the  positive 
and  negative  terms  were  chosen  respectively.  The  mostly  used  choice 
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operations  besides  those  just  mentioned  are:  (i)  the  choice  of  the  odd 
terms  of  the  sequence;  (ii)  the  choice  of  the  even  terms  of  the  sequence. 
In  all  the  operations  of  choice,  all  the  non-chosen  terms  are  replaced 
by  zeros.  The  spectra  of  the  transformed  sequences  under  the  operations 
of  choice  (i)  and  (ii)  are  denoted  by  and  respectively.  Combined 

with  the  choice  of  positive  and  negative  terms,  these  lead  to  four  new 
operations  of  choice: 

(1)  the  choice  of  positive  terms  of  the  sequence  on  odd  places; 

(2)  the  choice  of  negative  terms  of  the  sequence  on  odd  places; 

(3)  the  choice  of  positive  terms  on  even  places; 

(4)  the  choice  of  negative  terms  on  even  places. 

The  corresponding  spectra  are  denoted  by  S+,  S^,  S*  and  respectively. 

Note  that  in  a  choice  of  negative  numbers  (S  ,  S  ,  S^),  the  transformed 
sequence  is  formed  by  replacing  each  chosen  negative  by  its  absolute 
value,  and  each  of  the  non-chosen  numbers  by  zero;  thus  the  four  spectra 
designated  above  are  spectra  of  sequences  of  negative  integers, 

2.4.  SPECTRA  OF  RATIONAL  NUMBERS,  INTERVALS,  VECTORS,  MATRICES, 
POLYNOMIALS  AND  FUNCTIONS.  The  spectrum  of  a  rational  number  a/b 
is  defined  to  be  the  spectrum  of  the  sequence  a,b.  We  also  define  the 
spectrum  of  an  interval  [a:b]  to  be  the  spectrum  of  the  sequence  a,b. 

This  latter  definition  enables  us  to  apply  numerical  spectra  to  compu¬ 
tations  with  interval  arithmetic  ([12],  [22],  [6]).  (This  dual  use  of 
the  spectrum  of  a  sequence  a,b  should  not  cause  any  confusion  if  we 
stress  the  mathematical  notions  we  are  dealing  with).  The.  spectrum  of 
a  vector  is  the  spectrum  of  the  sequence  of  its  components.  The  row 
(column)  spectrum  of  a  matrix  is  the  spectrum  of  all  of  its  elements 
taken  row  by  row  (column  by  column).  The  decreasing  spectrum  of  a 

polynomial  P(x)  =  a^x"  +  a^x11  1  +  ...  +  a  is  the  spectrum  of  the 

sequence  ao,  a^,,,.,  a  .  The  increasing  spectrum  of  the  polynomial  P 

is  the  spectrum  of  the  sequence  a  ,  ....  a  .  It  is  obvious  that  the 
‘  n  o 

relation  between  these  two  numbers  is  the  same  as  between  ordinary  and 
inverse  spectra  of  the  same  sequence  of  numbers. 

One  of  the  most  important  properties  of  the  spectra  of  polynomials 
is  that  the  decreasing  soectrura  of  a  polynomial  is  the  numerical  value 
of  the  polynomials  for  x  =  10^: 

Sd  =  P(10h).  (2.6) 

Similarly  we  have 

SjL  =  10nh  p(10_h)  (2.7) 

where  n  is  the  degree  of  the  polynomial  and  h  is  the  rhythm  of  the 
spectrum.  The  subscripts  d  and  i  will  be  dropped  when  one  kind  of 
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spectra  of  polynomials  is  used 


In  many  applications  it  is  not  necessary  to  require  the  spectrum 
to  he  an  integer;  the  value  P(10“^)  may  be  considered  as  a  spectrum  of 
the  polynomial.  Such  a  spectrum  is  called  noninteger  spectrum  of  the 
polynomial  and  is  denoted  by  S^.  Thus 

S*  =  P  (10_h)  (2.8) 

This  new  notion  of  spectrum  gives  a  possibility  of  forming  spectra 
of  functions. 

Let  a  function  f  possess  the  following  properties:  (1)  it  has  the 
Taylor  series  expansion  at  the  point  a  in  its  domain  U;  (2)  the  coef¬ 
ficients  of  the  series  are  integers  ^+'  admitting  a  rhythm  h  such  that 

a  +  10  ^  e  ii.  Then  the  number 

S*  =  f (a  +  10~h)  (2.9) 

is  defined  to  be  the  noninteger  spectrum  with  rhythm  h  of  the  function 
£  at  the  point  a.  It  is  obvious  that  such  a  spectrum  is  an  infinite 
decimal  number.  Note  that  if  the  spectrum  f(a  +  10“^)  is  known  approx¬ 
imately,  to  the  nh^  decimal,  it  represents  the  spectrum  of  the  Taylor 
polynomial  of  degree  n  associated  with  f  at  the  point  a. 

OPERATIONS  OF  CHOICE  APPLIED  TO  POLYNOMIALS.  The  operations  of 
choice  defined  for  sequences  can  also  be  applied  to  polynomials.  We 
consider  a  polynomial  P(x)  arranged  by  decreasing  powers.  One  operation 
of  choice  leads  to  the  polynomials  P^(x)  and  P^Cx:)  composed  from  odd 

(respectively  even)  terms  of  the  polynomial.  If  n  is  an  even  number, 
then  is  an  even  function  and  P^  is  an  odd  function,  and  vice  versa 

if  n  is  an  odd  number. 

Another  operation  of  choice  leads  to  the  "positive  part",  denoted 
by  P+(x),  of  the  polynomial  P(x)  which  is  the  part  composed  from  terms 
with  positive  coefficients.  Similarly  the  negative  part,  denoted  by 
P  (x) ,  is  the  polynomial  composed  from  absolute  values  of  the  negative 
terms  of  P(x). 

Consecutive  applications  of  these  two  choices  lead  to  four  poly¬ 
nomials  P^(x),  P~(x),  P^lx),  P  (x)  defined  as  follows: 

p|(x)  =  [P  (x)]  +  ,  P*(x)  =  tpj(x)]  »  j  =  1»2. 

Thus  we  may  associate  with  each  given  polynomial,  in  addition  to 
its  omplete  spectrum  S,  the  following  spectra  S^  S^,  S  ,  S~ ,  S+,  S~, 

S+,  S^.  The  numerical  values  of  these  spectra  are 

(+)  This  requirement  is  weakened  in  subsequent  sections. 
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S  =  p(10h) 


S+  =  P+(10h),  s"  =  p"(10h) 

sj  =  pj(1°h)’  sj  =  pj(1°j)*  3  =  !.  2* 

2.5  SPECTRA  OF  POLYNOMIALS  IN  SEVERAL  VARIABLES.  Let  P(x,y)  be 
a  polynomial  in  two  variables  with  integer  coefficients.  Wc  arbitrarily 
choose  one  of  the  variables,  for  instance  x,  to  be  the  principal  variable. 
Then  P(x,y)  can  be  written  i; :  the  form 


• 

P(x,y)  =  7  a  (y)  xJ 


(2.10) 


3  =  1 


where  a.(y)  is  a  polynomial  in  y  with  integer  coefficients.  Let  m 

denote  the  greatest  degree  of  the  polynomials  a.(y),  j  =  0,1,  ...,  n. 

Let  h  denote  a  compatible  common  rhythm  for  these  polynomials.  The 
decreasing  spectrum  of  the  polynomial  a^  (y)  is  given  by 


Sj  =  a  (10  )J  =  0.1.2,  ...,  n. 


Now  we  form  the  polynomial 


Q(x)  =  l  s  x3 


(2.11) 


(2.12) 


J-l 


Taking  into  consideration  the  fact  that  each  of  coefficients  is  an 

integer  with  absolute  value  less  than  ,  we  can  form  all  kinds 

of  spectra  of  the  polynomial  Q(x)  with  the  rhythm  il  =  (m+l)h  or  with 
greater  rhythm.  For  instance,  the  decreasing  spectrum  S  of  the  poly¬ 
nomial  Q(x)  will  have  the  following  value 

C  =  q(10H>  =  j  s.  10jH  =  J  10jH  a. (10h)  =  P(10H,  10h)  (2.13) 

j=0  3  j-0  3 


The  last  formula  is  obtained  using  (2,10)  -  (2.12). 


This  kind  of  spectrum  is  also  called  spectrum  with  double  rhythm  H  and 
h.  The  rhythm  H  is  usually  taken  to  be  a  multiple  of  the  rhythm  h. 

H  h 

Thus  the  numerical  value  P(10  ,  10  )  is  called  the 
decreasing  spectrum  in  x  and  y  of  the  polynomial  P(x,y)  and  is  denoted  by 
Sdd.  The  number  P(10  , 10“h)  is  denoted  by  F*.  and  is 


233 


Similarly 


called  the  spectrum  decreasing  in  x  and  increasing  in  y. 

*  -h  H 

S, ,  =  P(10  ,  10  )  and 

Id 

Sii  "  P(,0"H*  10”H)  (2.14) 

The  last  notion  of  spectrum  (2.14)  can  be  extended  to  functions 
developable  in  Taylor  series  with  integer  coefficients  and  satisfying 
certain  conditions. 

The  spectra  of  polynomials  in  any  finite  number  of  variables 
may  he  obtained  in  the  same  way  as  for  polynomials  in  two  variables. 

REMARK:  A  spectrum  of  a  matrix  can  be  considered  as  a  spectrum 
with  double  rhythm.  In  this  case  the  spectrum  is  partitioned  by  means 
of  the  rhythm  H  into  sections  corresponding  to  the  rows  (or  columns) 
and  afterwards  by  means  of  the  rhythm  h  into  subsections  corresponding 
to  the  elements.  If  H  =  mh,  where  m  is  the  number  of  rows  (or  columns), 
this  different  appraoch  will  have  no  effect  on  the  numerical  value  of 
the  spectrum  itself.  But  if  H  ^  mh,  then  the  spectra  formed  by  means 
of  only  h,  or  by  means  of  H  and  h,  will  have  different  numerical  values. 

REMARK:  At  the  first  glance  the  condition  that  the  terms  of  the 
sequence  must  be  integers  appears  to  be  a  severe  restriction  on  the  use 
of  the  spectra.  Essentially  this  is  not  true,  because  if  the  sequence 
is  formed  from  finite  decimal  numbers,  they  can  be  transformed  for  pur¬ 
poses  of  numerical  spectra  into  integers  (by  multiplication  by  10^, 
for  some  sufficiently  large  positive  integer  k,  or  by  other  means). 
Keeping  in  mind  that  in  practical  calculations  only  finite  decimal 
numbers  are  used,  we  see  that  the  conditions  for  applicability  of  the 
spectral  method  do  not  place  any  severe  restriction  in  actual  computa¬ 
tions.  Of  course,  it  is  possible  to  calculate  with  decimal  numbers 
directly  using  spectra  of  sequences  of  decimal  numbers  as  developed 
below. 

2.6.  SPECTRA  OF  A  SEQUENCE  OF  FINITE  DECIMAL  NUMBERS.  It  is  only 
necessary  to  explain  the  formation  of  the  ordinary  spectrum  of  a 
sequence  of  positive  decimal  numbers  because  all  the  generalizations 
derived  from  this  notion  are  made  for  decimal  numbers  in  the  same 
way  as  for  integers. 

Consider  the  following  sequence  of  positive  decimal  numbers 

12.623  0.17  25.1  (2.15) 

The  corresponding  spectral  numbers  for  integers  are  numbers  having  the 
same  number  of  digits.  For  decimal  numbers,  the  spectral  numbers  must 
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have  the  same  number  of  digits  for  the  decimal  part,  but  these  two 
numbers  may  differ.  Thus  spectral  decimal  numbers  of  the  sequence 
(2.15)  are 

12.623  00.623  25.100 

or 

012.623  000.170  025.100 

In  the  first  case  the  rhythms  for  the  decimal  and  entire  parts 
are  different;  in  the  second  case,  they  are  equal.  Thus  there  occur 
two  numbers  having  connection  with  the  rhythm.  The  sum  of  these  two 
numbers  is  called  the  ma in  rhythm  of  the  spectrum  of  decimal  numbers 
and  is  denoted  by  h.  The  rhythm  for  the  decimal  parts  is  denoted  by 
p  and  is  called  the  point  rhythm  of  the  spectrum. 

The  ordinary  spectrum  of  a  sequence  of  decimal  numbers  is  obtained 
by  writing  one  after  another  all  the  spectral  numbers.  Thus  the  spec¬ 
trum  S  for  the  sequence  (2.15)  with  h  =  6,  p  =  3  is  the  number 

S  =  012623000170025100. 

The  spectrum  of  a  sequence  of  decimal  numbers,  like  the  spectrum  of  a 
sequence  of  integers,  is  an  integer.  The  only  difference  occurs  in  the 
partitioning  of  such  spectra.  The  ordinary  integer  spectrum  with  single 
rhythm  is  partitioned  into  sections  only,  and  each  section  contains  one 
integer.  The  integer  spectrum  with  double  rhythm  (H  and  h)  is  partitioned 
first  into  sections  (by  the  rhythm  H)  and  then  each  section  is  parti¬ 
tioned  into  subsections  (by  the  rhythm  h).  Every  section  contains  one 
integer  only.  The  same  is  true  for  every  subsection.  In  the  case  of 
a  spectrum  of  decimal  numbers,  the  spectrum  is  partitioned  by  the  main 
rhythm  h„  but  every  section  of  the  spectrum  of  the  sequence  of  decimal 
numbers  contains  a  decimal  number.  Thus  the  main  rhythm  h  in  decimal 
spectra  plays  the  same  role  as  the  rhythm  h  in  integer  spectra.  The 
point  rhythm  p  does  not  serve  in  further  partitioning  of  the  section  into 
subsections.  its  only  purpose  is  to  place  the  decimal  point  in  each 
section  after  p  digits,  coutning  from  right  to  left.  The  spectrum  S 
partitioned  into  sections  and  written  with  the  decimal  points  is 

S  =  012.623j000.170l025.100 

The  decimal  points,  the  signs  separating  the  sections  as  well  as  the 
zero(s)  at  the  beginning  of  the  first  section  may  be  omitted  without 
introducing  any  ambiguity.  From  S  =  12623000170025100  it  is  very  easy 
to  obtain  by  means  of  h  and  p  the  sequence  (2.15). 
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The  notion  of  the  spectrum  of  a  sequence  of  positive  and  nega¬ 
tive  decimal  numbers  is  obtained  in  a  similar  way  as  for  integers 
because  the  symbols  S+  and  S“  have  the  same  meaning  also  in  this  case. 
This  remark  also  applies  to  inverse  spectrum  and  corrected  spectra. 

All  the  specific  operations  with  spectra  mentioned  above  are  applicable 
to  the  spectra  of  decimal  numbers  in  the  same  way  as  they  are  for 
integers . 

The  formulae  (2.6)  and  (2.7),  are  modified  for  polynomials  with 
finite  decimal  numbers  as  follows: 

s,  =  iop  p(ioh),  s.  =  ionh+p  P(10“h) 

a  1 

The  formulae  (2.8)  and  (2.9)  remain  valid.  Thus,  to  avoid  any  misunder¬ 
standing,  it  must  be  always  stressed  whether  the  spectrum  is  the  spec¬ 
trum  of  integers,  (I-spectrum)  or  the  spectrum  of  decimal  numbers 
(D-spectrum) . 

The  partitioning  of  all  noninteger  spectra  (marked  with  *)  in  the 
case  of  integers  starts  always  in  both  directions  (right  and  left) 
from  the  decimal  point  of  the  noninteger  spectrum.  The  partitioning 
of  a  noninteger  spectrum  from  decimal  numbers  starts  in  both  directions 
also,  but  not  from  the  decimal  point  of  the  noninteger  spectrum,  instead 
it  starts  from  the  point  which  is  p  digits  to  the  right  of  the  decimal 
point.  Thus  the  internal  operations  on  a  D-spectrum  are  very  similar 
to  the  internal  operations  on  an  I-spectrum  described  in  Section  2.3 
and  it  is  not  necessary  to  add  any  further  explanation.  The  arith¬ 
metical  operations  with  spectra  are  binary  operations.  We  consider 
here  addition  and  multiplication  of  spectra.  We  note  that  an  operation 
of  a  spectrum  and  a  number  does  not  require  special  consideration 
because  any  number  is  considered,  in  the  context  of  numerical  spectral 
analysis,  to  be  an  1 -spectrum  with  one  section  only  if  this  number  is 
an  integer,  or  a  I)-spectrum  with  one  section  if  this  number  is  a  decimal 
one.  We  also  observe  the  general  rule  that  the  rhythm  h  for  all  the 
spectra  used  in  one  problem  must  be  the  same.  This  rhythm  remains  the 
same  under  any  arithmetical  operation. 

Addition  is  defined  only  on  I-spectra  or  on  D-spectra  with  the 
same  h  and  p.  If  it  is  necessary  to  add  an  I-spectrum  and  a  D-spectrum, 
the  first  must  be  transformed  into  a  D-spectrum  (with  the  same  p). 
Multiplication  is  defined  for  any  two  spectra.  If  both  are  D-spectra, 
the  point  rhythm  will  be  the  sum  +  p^  of  the  point  rhythms  of  the 
factor  spectra.  Thus  an  integer  spectrum  can  be  considered  as  a 
D-spectrum  with  p  =  0. 
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3.  APPLICATIONS  OF  NUMERICAL  SPECTRA  TO  CALCULATIONS  WITH  POLYNOMIALS. 
3.1.  THE  NUMERICAL  SPECTRAL  METHOD  IN  ALGEBRA  AND  ANALYSIS. 

The  spectral  method  to  be  developed  In  this  paper  is  essentially 
a  method  of  arithmet izat Ion  of  the  algebraic  or  analytic  problems. 

The  analysis  of  such  problems  often  reduces  to  a  set  of  operators  $ 

and  to  the  computation  of  elements  defined  by 

Fi  =  :i(fi*  f2*  fn}  1  =  1*2*-“»ra  (3.1) 


where  each  i ^  is  an  element  belonging  to  a  set  X^.  This  may  also  be 

written  as  a  single  operator  4-  on  the  space  X  =  X  x  ...  x  X  into  the 

1  n 

space  Y  =  Y1  x  ...  x  Y  ,  where  F.  t  Y.: 

1  m  it 


F  =  1(f),  f 


(f 1» • . • ifn) » 


=  (F 


m 


), 


(Fl,,*”Fm) 


Thus  the  operator  I  represents  the  totality  of  operations  (of  any  kind) 

to  be  performed  on  the  elements  f..  The  elements  F . F  may  be 

l  I'm 

interpreted  as  the  output  of  a  multiinput  system.  In  the  present 
exposition  the  elements  f.  can  be  functions  of  one  or  several  variables, 
sequences,  vectors,  matrices,  etc. 


The  essence  of  the  spectral  method  is  to  obtain  the  elements 
F  using  arithmetical  operations.  This  end  is  achieved  in  three 
steps.  The  first  step  is  the  transformation  of  all  the  elements  f^ 

into  numbers  -  their  spectra  S.,  i  =  l,2,...,n.  The  second  step  is 
the  calculation  of  a  number  called  the  resulting  spectrum  S  of  the 
required  F.  This  calculation  is  performed  by  means  of  a  formula 


S 


V 


(3.2) 


★ 

where  is  the  totality  of  arithmetical  operations  to  be  performed. 
Formula  (3.2)  entails  an  arithmetization  of  the  operator  i^,  i  =  1,. 

The  third  and  last  step  is  to  find  the  inverse  transform  of  S.  This 
means  the  translation  of  the  number  S  into  the  element  F. 


i 


m. 


3.2.  CALCULATIONS  WITH  POLYNOMIALS.  We  first  consider  the  appli¬ 
cation  of  the  spectral  method  to  the  computation  of  a  linear  combination 
of  a  given  set  of  polynomials.  We  shall  assume  that  the  coefficients 
of  the  polynomials  and  the  coefficients  of  the  linear  combination 
are  integer.  The  problem  then  is  to  find 


n 

P(x)  =  •  a .P . (x)  (3.3) 

j  =  l  -1  J 
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where  P  (x) ,  J  *  l,..,,n  are  given  polynomials.  Putting  x  -  10h, 

we  obtain 

P(10h)  =  l  a  P  (10h) 
j=l  3  3 

Thus  if  the  rhythm  h  is  compatible  not  only  with  the  polynomials  P^ 

but  also  with  the  resulting  polynomial  P,  then 
n 

S  -  l  aS  (3.4) 

j-1  3  3 

Note  that  (3.3)  and  (3.4)  are  specific  realizations  of  the  abstract 
relations  (3.1)  and  (3.2)  respectively  .  The  formula  (3.4)  suggests 
to  establish  the  following  rule:  All  the  spectra  of  elements 

entering  in  the  transformation  (3.1)  must  be  made  by  means  of  the  same 
rhythm  h. 

Thus  the  main  problem  is  to  establish  the  formula  for  calculating 
the  resulting  spectrum  (in  our  case  (3.4))  and  to  find  the  rhythm  h, 
which  will  be  surely  compatible  with  P  and  P. 

The  determination  ot  a  rhythm  compatible  with  the  given  polynomials 
is  trivial.  There  are  two  ways  of  finding  the  rhythm  h  compatible 

with  the  polynomial  P.  One  is  the  precise  and  the  other  is  a  rough 
way.  Both  are  based  on  the  use  of  majorants. 

We  first  introduce  the  notion  of  the  indicatory  number.  For  any 
positive  integer  p,  let  (p)  denote  the  number  of  digits  in  p.  The 
indicatory  number  (associated  with  fixed  majorants)  is  a  number  ■* 
such  that  (')  is  the  compatible  rhythm. 

Let  A  be  the.  maximum  of  the  absolute  values  of  the  coefficients 
of  the  polynomials  P^,  i  =  l,„..,n  and  a  =:  max  ja  |.  Then  it  follows 

readily  that 


<5  *  2  n  a  A  (3.5) 

is  the  indicatory  number.  A  less  precise  way  of  obtaining  a  compatible 
rhythm  is  given  by 

h  =  (a)  +  (A)  +  [log  2n ]  +  1  (3.6) 
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where  [u]  denotes  the  greatest  integer  in  u.  Note  that  the  formula  (3.6) 
takes  into  consideration  the  number  of  digits  in  each  of  the  coefficients 
but  not  the  values  of  these  coef f icients .  We  call  this  the  rou  fill 
rhythm  and  (*.)  the  precise  rhythm,  since  in  general  (6)  <  h  where 

h  is  rough  rhythm  (given  by  (3.6)),  Each  of  these  two  methods  has  its 
advantages.  The  first  gives  a  smaller  rhythm  but  requires  more  calcu¬ 
lations.  The  second  requires  only  a  trivial  calculation  hut  gives  in 
general  a  less  precise  rhythm. 

Example.  Add  the  following  polynomials: 

P^x)  =  32x2  -  18x  +  32,  P2(x)  =  54x2  -  42x  -28,  P3(x)  =  -8x2  +•  9x  +  47. 

The  above  mentioned  numbers  are 

a  =  1,  (a)  =1,  A  =  54,  (A)  =2,  n  *  2,  4  =  324. 

The  rhythm  determined  in  the  precise  way  is  h  =  3  and  in  rough  way  h  =  4. 
The  decreasing  spectra  of  the  polynomials  (with  h  =  3)  are 

=  31 1 982 1 032,  S  =  53 f  957 1 972,  S3  =  -7  j  990j  953. 

The  resulting  spectrum  S  is  77 J 949 | 051,  from  which  we  get  the  sum 
of  the  polynomials, 

P (x)  =  78x2  -  51x  +  51. 

It  is  obvious  that  the  advantage  of  spectral  calculations  occurs 
only  when  calculating  machines  (desk  calculators  or  computers)  are 
used.  The  advantage  increases  with  the  capacity  of  the  machine.  The 
spectral  method  is  particularly  useful  when  many  different  linear  com¬ 
binations  of  the  same  set  of  polynomials  must  hi5  calculated,  because 
the  spectra  are  to  be  formed  only  once  and  then  retained  in  the  memory. 

If  the  spectra  are  too  long  to  enter  into  the  machine,  they  can  be 
cut  and  calculated  part  by  part. 

Now  we  consider  the  application  of  the  spectral  method  to  the 
calculation  of  the  product  of  two  polynomials: 

^1  .  n2 
P  (x)  =  )  a.  x1,  P  (x)  =  T  b.  x1 

i=l  1  i=l  1 

As  before,  the  main  problem  is  to  determine  the  rhythm  of  the  spectra. 

Let  A^  =  maxla^.;,  A^  =  maxjb^;,  n  =  1  +  max(n^,n2j,  Then  it  is  easy 

to  show  that  6  =  2n  A^A,,,  is  the  indicatory  number  (and  therefore  (c ) 

is  the  precise  rhythm)  and  that 
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h  -  (Ax)  =  (A2)  +  [log  2n]  +  1 
is  a  rough  rhythm. 

Example.  Multiply  the  polynomials: 

P1(x)  =  46x2  -  54x  4-  48,  P2(x)  =  26x2  +  21x  -  23. 

For  this  example  4  =  8424,  which  leads  to  the  rhythm  h  =  4.  In 
the  rough  way,  we  get  the  wider  rhythm  h  =  5.  Taking  h  =  4,  we  obtain 

Sx  =  4 5 f 9946 f 0048  S2  =  26 1 0020j  9977 

S  =  S  S  =  1195 | 9561 ! 9056 { 2249 | 8896 
This  spectrum  leads  to  the  desired  product: 

P(x)  =  1196x4  -  438x3  -  944x2  +  2250x  -  1104. 

The  advantage  of  using  spectra  in  this  example  is  obvious. 

Instead  of  9  multiplications  and  4  additions  and  writing  of  inter¬ 
mediary  results,  we  have  only  one  multiplication  of  spectra  and  the 
determination  of  the  rhythm.  The  spectra  can  be  formed  directly  by 
typing  the  digits  into  the  calculating  machine.  Note  that  there  is 
an  additional  advantage  in  the  spectral  method  in  that  the  rhythm  is 
calculated  only  once  and  may  be  used  for  a  large  number  of  calculations 
of  the  same  kind.  There  is  no  need  to  copy  down  the  resulting  spectrum 
from  the  machine  since  the  resulting  polynomial  can  be  written  directly 
so  spectra  occur  only  in  the  calculating  machine. 

3.3.  ADVANTAGES  OF  SPECTRAL  CALCULATIONS.  The  last  example  gives 
us  the  possibility  to  formulate  some  advantages  of  the  spectral  method 
which  are  more  or  less  valid  in  all  spectral  calculations. 

For  desk  calculators  these  advantages  are: 

1.  The  number  of  operations  is  considerably  reduced. 

2.  The  writing  of  intermediary  results  is  reduced  or 

eliminated  completely. 

3.  The  scheme  of  calculating  is  much  simpler. 

4.  The  number  of  possible  mistakes  is  reduced. 

For  computers  the  advantages  are: 

1.  The  number  of  operations  is  much  less. 

2.  The  program  is  simpler. 

3.  The  required  memory  capacity  is  considerably  less. 
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4.  APPLICATIONS  OF  NUMERICAL  SPECTRA  TO  ARITHMETIC, 
4.1.  REARITHMETIZATION  OF  ARITHMETIC. 


The  application  of  spectra  to  algebra  was  described  earlier  as 
an  ar ithmetization  of  the  algebraic  problems.  So  the  applicaton  of 
spectra  to  arithmetic  seems  to  be,  at  the  first  glance,  meaningless. 

To  arithmetize  arithmetic  may  appear  to  be  of  no  use.  But  this  is  not 
the  case.  To  explain  the  usefulness  of  the  application  of  numerical 
spectra  to  ar ' Emetic ,  we  consider  the  problem  of  finding  the  following 
sums 


1 


J-l 


n 

y 


n 


i 

j=i 


c  . 
J 


Each  sum  is  obtained  by  n-1  additions;  thus  the  total  number  of 

arithmetical  operations  is  3 (n-1).  The  numbers  a.,  b.,  c.  of  the  same 

J  J  J 

index  can  be  considered  as  the  coefficients  of  the  polynomial 

2 

Pj(x)  =  aj  +  bjx  +  cjx  »  3  =  1,2, ...,n. 

By  adding  these  polynomials,  we  obtain 


n 

r 


P(x)  =  7  P.(x)  =  l 

j-l  3  J-l 


a  .  +  x 
J 


x  1 


3  =  1J 


J-l 


The  coefficients  of  the  resulting  polynomial  are  just  the  required 
sums.  Thus  the  required  sums  are  obtained  by  means  of  n-1  algebraic 
operations.  This  apparent  decrease  in  the  number  of  operations  is 
surely  only  formal,  because  in  ordinary  calculations  we  must  perform 
the  same  number  3(n-l)  or  arithmetical  operations.  But  by  means  of 
numerical  spectra  every  polynomial  P^ (x)  is  replaced  by  its  spectrum 

S^.  and  we  have  to  perform  n-1  additions  of  spectra.  This  means  that 

we  have  to  perform  only  n-1  arithmetical  operations  to  obtain  the 
resulting  spectrum  S  =  +  ...  +  S  .  It  is  evident  that  the  spectrum 

S  is  just  the  spectrum  of  the  polynomial  P(x).  The  values  of  the 
sections  of  the  spectrum  S  are  the  values  of  the  required  sums. 

Thus  this  rearithmetization  of  arithmetic  happens  to  be  useful 
in  reducing  considerably  the  number  of  operations.  The  use  of  algebra 
is  only  necessary  for  the.  justification  of  several  spectral  processes. 

The  spectrum  S.  of  numbers  a.,  b.,  c.  with  the  same  index,  can  be 
3  3  3  3 

obtained  directly  without  recourse  to  any  algebra. 


We  now  turn  to  systematic  formulation  of  the  spectral  method  in 
arithmetic. 


2A1 


4.2.  ADDITION  AND  SUBTRACTION.  The  problem  of  addition  was 
elaborated  in  Section  4.1  as  an  example  of  the  advantages  of  the  spectral 
method  in  arithmetic.  It  remains  only  to  find  a  compatible  rhythm  for 
this  operation.  Let  a  ,  1  =  j  =  l,...,n  be  a  given  set  of 

positive  numbers.  We  want  to  find  the  sums 

m 

K=  j  a  ,  j=l,...,n. 

J  i=l  J 


The  indicatory  number  is  '  =  ma,  where  a  =  max  a. .. 

ij 

obtained  in  the  rough  way  is  h  =  (a)  +  [log  m]  +  1. 

let  denote  the  spectrum  of  the  sequence, 
j 


The  rhythm  h 

For  each  j  =  1, . . . ,n, 


'Ij’ 


2  j 


mj 


Then  the  values  of  the  sections  of 
required  sums  K ,  j  =  l,...,n. 


n 

the  spectrum  S  =  \  S.  are  the 

j  =  l  J 


In  the  case  of  subtraction  the  number  n  remains  arbitrary  and  m  -  2. 
The  indicatory  number  is  =  a  and  the  rough  rhythm  is  b  =  (a).  Thus 
the  subtraction  in  spectral  arithmetic  is  a  simpler  operation  than 
addition.  It  may  be  noted  that  the  advantage  of  using  spectra  for 
addition  and  subtraction  is  not  as  great  as  it  is  in  the  case  of 
multiplication  or  in  problems  involving  both  addition  and  multiplication. 


Example:  (a). 

Perform 

the  following 

additions : 

1628 

2834 

1986 

1153 

839 

17  2  3 

1312 

992 

1213 

752 

6  76 

523 

Here  a  =  2834, 

n  -  3,  ■ 

=  8502,  h  =  4, 

2020 
813 
141  2 


The  same  rhythm  is  obtained  in  the  rough  way. 
Sx  =  1628 | 2834i 1986 | 1153 | 2020 


S  =  839! 1723* 1312* 0992 | 0813 


S3  =  1213 i 0752! 0678  j 0523 i 1412 


S  =  3680‘ 5309: 3976  2668:4245 
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(b).  Perform  the  following  subtractions: 

383  -  168,  96  -  57,  215  -  182,  312  -  89 

a  =  383,  (a)  =3,  h  =  3. 

S  =  383 | 096 1 215 | 312 
S2  «  168 1 057 1 182 j 089 

S  -  S1  -  S  =  215 | 039 | 033 [ 223. 

The  sections  give  the  desired  answers. 

4.3.  SCALAR  HULTIP1J CATION  AND  DISTRIBUTIVE  MULTIPLICATION. 

We  consider  the  operation  of  multiplying  each  number  of  a  given 
sequence  a^,...,a  of  positive  numbers  by  a  fixed  positive  number  b. 

It  is  easy  to  see  that  the  indicatory  number  is  A  =  ab,  where 
a  =  max  a,.  The  rhythm  in  the  rough  way  is  h  -  (a)  +  (b).  The  values 

J 

of  the  sections  of  the  resulting  spectrum  Sb  give  the  desired  sequence 

ha  , . . • ,ba  « 

1  n 

Exanple :  Multiply  each  of  the  numbers  68,  105,  87,  by  93. 

Here  a  =  105,  A  =  9765,  h  =  4.  In  the  rough  way  the  rhythm  is  5. 
Using  the  precise  rhythm  wo  get 

S  =  68  j  01  .')5|  0087 
b  -  93 

S.b  «  632 4  9765 | 8091 

We  now  consider  the  operation  of  multiplication  of  each  number  of  a 
sequence  with  each  number  of  another  sequence  (distributive  multipli¬ 
cation).  Let  a^,  i  =  l,,,„,n  and  b . ,  j  =  i,,..,k  b?  given  sequences 

of  positive  numbers.  We  elongate  the  first  sequence  by  putting  (k-1) 
zeros  between  each  two  terms.  We  then  form  the  ordinary  spectrum 

of  the  elongated  sequence  and  the  ordinary  spectrum  S?  of  the  second 

sequence.  Then  S  =  S  gives  the  spectrum  of  the  desired  sequence 

of  products.  Note  that  the  indicatory  number  is  A  =  ab,  where 
a  =  max  a.,  b  =  max  b  ,  and  the  rhythm  in  rougii  way  is  h  =  (a)  +  (b) . 
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Examp  1  e »  Multiply  e,ui.  number  of  the  sequence  18,  16,  11  by  the  numbers 
47,  33,  28. 

In  this  example,  n  -=  k  =  3,  a  =  18,  b  =  47,  6  =  846,  h  -  3  (in 
rough  way  it  would  be  !.  =  4), 

Sx  =  18 ! 000 : 000 ; 016 | 000 | 000 | Oil 

S2  =  47 | 033 | 028 

S  =  S1  S2  =  846- 5941 504; 752 ■ 528 | 448 ! 517; 363  j 308 

Hence  the  desired  produces  are  given  by  sections  in  the  same  order 

from  left  to  right,  tor  example  18  x  47  =  846,  18  x  33  =  594,  18  x  28  =  504, 

etc. 


4.4.  CALCULATIONS  WITH  FKACT IONS. 


The  suit,  of  positive  fractions 


(4.1) 


may  be  obtained  by  the  spectral  method  as  fellows. 

We  form  the  ordinary  spectra  or  these  fractions,  S  ,  S .  S 

with  the  rhythm  12  n 


h  =  (log  n  An]  +  1  (4.2) 

where  A  =  max  (a^,  b),i=L,.,  n.  Then  the  last  section  of  the 

product  spectrum  S  -  S,  S.,  . . .  is  the  denominator  and  the  penultimate 

section  is  the  numeral’ u*  ol  tin  sum  of  fractions.  The  calculation  of 
the  product  spectrum  may  he  simplified  by  retaining  after  each  multi¬ 
plication  the  last  two  sections  only. 

If  some  of  the  terms  of  the  expression  (4.1)  are  negative,  then 
we  associate  the  negative  sign  with  the  denominators  of  these  terms; 
in  such  cases  of  mixed  addition  and  subtraction  the  rhythm  is  slightly 
changed; 

h  =  [log  2n  An]  +  1,  A  =  max  (ja^,  'h  )  (4.3) 

The  application  ai  spectra  always  reduces  the  number  of  operations. 
Ordinarily  the  sum  (4.1)  is  calculated  by  means  of  n  divisions  and  n  -  1 
additions.  Using  the  spectral  method  the  result,  in  the  form  of  a 
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decimal  number,  Is  obtained  by  means  of  n  mult  ini. 
division  only.  For  computers  the  time  ior  the  i<-r 
is  much  greater  than  the  time  for  multiplier;.  . 
time  will  be  considerable. 


Example : 


19  27  21 

IT  +  28  17 


Here  h  =  5,  S  =  19  00031,  S„  =  27 : 00028 

1  i. 


S1S2  = 


.  001369  |  00868, 


S3 


S  S  S*  41501  14756 


Thus  the  sum  is 


41501 


2.8308 


14756 

Note  that  as  a  by-product  we  obtain  the  sum  oi  i... 


Inspite  of  the  great  saving  of  time,  the  i.:- ;  • 
very  helpful,  because  calculation  with  ordinary  ;r,i.  t:ons 


seldom.  But  every  division  of  decimal  numner  .fi¬ 

ord  inary  fraction.  Also  it  Is  possible  to  t  or:-,  ‘ 
numbers  directly.  Thus  ~  represents  the  .-u.\  •  im 

and  the  expression  (4.1)  is  a  combination  of  1 ;  ;ud 

just  as  the  dot  product  is  a  combination  of  a  ion 

It  is  easy  to  see  that  the  main  rhvehm  ;  cr. 


h  =  [  1 0);  2n  An]  +  1 
r  .  s  . 

where  A  “  max(10  ;a.:,  10  b.i),  r .  (rest, 

digits  in  a^(resp.  b.),  The  point  rhythm  : v 

p  =  [iog  B]  +  I 

r  .  s . 

where  B  =  max(10  la.!',  10  1  { : b  .  .  -  and  -u - 

l  \ 

Example: 

12. 19  6.14 

8.12  10.25 


and  one 

of  division 

:u-  saving  of 


fractions. 

to  be  not 
is  very 
Jed  as  an 
1  rom  decimal 
nl  numbers 

additions 
and  additions. 

t  run  i.s 


•  of 


oir.aJ  part  of  u. 


For  this  case,  n 


2,  h  =  8,  p  =  2. 


S  =  12.  19  000008.12 
S.?  =  6.  14  000010.25 


S  S,  =  ... |0174.8043!0083.2300 

Thus  the  result  is  ^  Vnn~  =  2.1003 

OJ • Z  JUU 


4.5.  INTERVAL  ARITHMETIC  AND  NUMERICAL  SPECTRA 

For  given  real  numbers  a,b  with  a  _<  b ,  the  set  {x:  a  _<  x  _<  b } 
is  called  an  interval,  number  and  is  denoted  by  [a:b].  Note  that  the 
degenerate  case  a  =  b  is  included.  Interval  arithmetic  is  an  arithmetic 
system  which  uses  interval  numbers  as  elements.  Interval  arithmetic 
operations  are  defined  by 


[a:b]  o  [  c,d )  =  iz:  z  =  xoy,  a  _<  x  ^  b,  c  J;  y  .1 

where  o  denotes  addition,  subtraction,  multiplication  or  division 
(division  by  an  interval  containing  zero  is  excluded).  Since  the 
ordinary  arithmetic  operations  are  continuous,  they  map  the  cartesian 
product  [a:b]  x  [c:d],  which  is  a  compact  connected  set,  onto  a  com¬ 
pact  connected  set,  i.e.,  a  closed  real  interval.  In  fact,  the 
following  formulae  can  be  easily  deduced: 


[  a :  b  ]  +  [  c :  d  ] 
[a:b]  -  [c:d] 
f  a :  b  ]  .  [  c :  d  ] 

[a  :b  ]  +  [  c :  d  ] 


[ a+c  :  b+d ] 

[a-d  :  b-cj 

[min  (ac,  ad,  be,  bd) 


(4.4) 

max  (ac,  ad,  be,  bd)] 
[c:d] 


Thus  for  interval  multiplication,  four  ordinary  multiplications  are  to 
be  performed  in  the  case-  a  ~  0  ■  b,  c  <0^8,  whereas  two  multiplica¬ 
tions  suffice  for  the  remaining  cases. 


It  follows  from  these  formulae  that  under  the  correspondence 
[a:a]  **■  a,  the  ordinary  arithmetic  of  real  numbers  is  included  in 
interval  arithmetic  if  one  makes  the  identification  a  =  [a:a].  For 
properties  of  interval  arithmetic,  see,  for  instance,  [11],  [12],  [22], 

It  should  be  noted  that  numbers  that  arise  in  applications  are 
actually  interval  numbers  due  to  error  in  experimental  initial  data  or 
round  off  in  performing  arithmetic  operations.  It  is  not  surprising 
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therefore  that  interval  arithmetic  should  find  its  way,  as  it  has 
recently  [6],  to  elementary  calculus  books. 

Interval  arithmetic  plays  an  important  role  in  the  automatic 
analysis  and  control  of  error  in  digital  computations  [11]  and  provides 
a  convenient  setting  for  the  accurate  numerical  integration  of  ordinary 
differential  equations  [12],  It  may  also  be  used  advantageously  in 
the  programming  of  Newton’s  method  for  automatic  computation  [22], 

An  extensive  bibliography  on  the  theory  and  applications  of  interval 
arithmetic  and  interval  functions  is  given  in  [11],  [12],  In  this 
section  we  shall  show  that  numerical  spectra  can  be  used  to  an  advantage 
in  interval  arithmetic. 

It  follows  from  above  that  addition  of  n  intervals  [a.:b^]  reduces 

to  finding  certain  linear  combinations  of  numbers,  which  was  elaborated 
earlier.  Thus  the  indicatory  number  is  l.  =  2  n  a,  where  a  =  max  v|a.!,;b 

and  the  rhythm  in  the  rough  way  is  h  =  (0  +  [log  2n]  +  1.  The  spectral 
process  is  to  form  the  ordinary  spectrum  S  of  the  sequence  a.,  b1,,..,a  , 

with  rhythm  h  and  to  multiply  S’  with  a  number  consisting  of  n  digits  of 
1  interlaced  with  n-1  digits  of  0  between  every  two  l’s.  This  gives  a 
spectrum  with  an  even  number  of  sections;  the  effective  values  of  the 
middle  section  and  are  the  beginning  and  the  end  of  the  sum  of 
intervals. 

Example;  [8:16]  +  [-5:8]  +  [ —16 : — 3 ] .  Here  h  =  2. 

S  =  8 1  15 j  95 ; 07 i 83 j  97 
1  0001 1  0001 

8 1 16 1  03 1  23 1  87 20 !  79 ' 04  i  83 i  97 

The  effective  values  of  the  middle  sections  are  -13,  21,  which 
give  the  required  interval  sum.  Note  that  the  resulting  spectrum  also 
gives  the  sum  of  the  first  two  (last  two)  intervals  which  may  be  read 
from  the  second  and  fourth  (reap,  seventh  and  eithth)  sections  of  the 
resulting  spectrum.  The  addition  of  four  intervals  using  spectra 
gives  five  results  and  so  on. 

Subtraction  of  intervals  can  be  transformed  into  addition: 

[a:b  -  [ c : d  ]  =  [a:b]  +  -d:  -c] 

The  advantages  of  spectral  addition  and  subtract  ion  of  intervals 
are  limited  to  desk  calculators.  Spectra  are  particularly  useful  when 
all,  or  most  of,  the  obtained  interval  sums  are  needed.  This  advantage 
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can  be  improved  by  the  appropriate  arrangements  of  the  intervals,  since 
the  commutative  and  associative  laws  are  valid  for  interval  addition. 

In  view  of  the  formula  for  interval  product  given  in  (4.4).  the 
spectral  process  for  obtaining  such  a  product  was  elaborated  earlier 
(Section  4.3).  This  product  may  also  be  obtained  by  use  of  inverse 
spectra,  which  would  avoid  the  negativity  of  spectra.  In  this  case 
we  must  elongate  the  first  sequence  in  the  middle  by  one  zero.  The 
interval  product  is  then  given  by  the  maximum  and  minimum  values  of 
Z  Z_,  where  Z  and  T  are  the  inverse  spectra  of  the  first  and  second 
intervals.  1  “ 


4.6.  CHECK  UP  OF  THE  RESULTS.  We  shall  confine  the  check  up  by 
spectra  to  the  traditional  tests  by  9,  and  by  11.  The  check  up  by  9 
consists  of  finding  the  sums  s.  =  s.(a.)  of  the  digits  of  each  number 

1X1 

a^  and  of  performing  all  the  required  operations  with  t^ ,  where 

t^  =  (mod  9),  instead  of  a^.  To  perform  this  check  up  by  means  ol 

spectra,  we  have  to  use  spectra  to  calculate  each  of  the  numbers  s. 

This  may  be  obtained  as  a  dot  product,  where  one  of  the  n-d  I  mem.  i  .uni  I 
vectors  has  all  its  coordinates  equal  to  one.  We  use  the  rhythm  h  -  2 
and  multiply  the  spectrum  of  a  number  a.  (  having  n  digits)  by  the 

spectrum  of  the  sequence  1,  0,  1,  0,  ...,  1  of  2n  -  1  terms.  The 

middle  part  M(S)  of  the  resulting  spectrum  gives  the  corresponding 

number  s . . 

i 

The  check  up  by  11  is  performed  in  the  same  way  except  that,  instead 
of  using  the  spectrum  of  the  sequence  1,  0,  1,  0,  1,  we  use  the 

spectrum  of  the  sequence  1,-1,  1,  -i,  ...,  1  of  2n  -  .1  terms. 

5.  APPLICATIONS  OF  NUMERICAL  SPECTRA  TO  COMPUTATiONAL  M F.Ti'i )i 
OF  LINEAR  ALGEBRA. 

5.1.  COMPUTATION  OF  INNER  PRODUCT  US  TNG  SPECTRA. 

Let  a  and  b  be  two  vectors  in  with  components  a  and  !.  .  , 
j  =  I,  ...»  n.  We  introduce  the  auxiliary  polynomials 

n  _  •  n  ■  i 

P(x)  =  a.  xn  J,  Q(x)  =  )  b.  xJ 

j=i  J  j=l  J 


and 


R(x) 


P (x)Q(x) 


2n-l 

=  L 
j  =  l 


c  .  x 
.1 


i.  n  -  ■ 


-.1 
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a.b  =  M(S  ,  ) 
a  b 


is  the  formula  for  the  computation  of  the  standar.  inner  product  in 
using  the  spectral  method. 


5.2.  MATRIX  AUi 

EKRA 

IS  'INO  SPECTRA ,  1. 

(M  A 

!  wo  m  x 

n  matrices 

To  compute  A  +  B  by 

'  •  ’  1  •  f-J 

pee tra I 

met  hoe 

:  a  i  no  spo 

ctra 

of 

the  matrices  A  and  ii 

,  ■  • 

n<  c 

e saury  tie. 

i  .  •  c I  !  ’ 

,  ■  :  1  a  i.i  of 

the 

same 

kind  (that  is,  both  i 

i) y  i  •  • 

VP  er  i 

O  l  h  v  ee  l 

•:nns ; 

■  -'iii  ned 

by 

means  of  the  same  r  i i 

vl  in 

h .  is 

h  is  a  r: 

V  (.  y  \‘M  \ 

■  .a  lit.  .  i  ih 
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matrix  A  and  h_  with 
b 

t  Jir 

m.  1 1  r  i  x 

b,  l  hen 

h  =  max 

(hA* 

+  1 

is  compatible  rhythm 

w  ith 

A  +  B. 

Another  value  o 

1  a  e 

ir 

ie  ’  t  ' 

eO  VOlll 

the 

values  of  the  coo  ft  i 

■'  iont 

>  *  ?  .  .  '  l 

-] ! 

, 

a: 

max 

b.  .  | 

» .i 

‘  J  ‘ 

il 

and  y  ■  max  (  .  ,  )  , 

rin-r. 

*  :  »  i r. A 

A . 

and 

the  rough  method 

Mi  r 

i»  .  ; i' 

.  i ; 

:v  n 

matrix  and  B  be  an  n 

x  p 

mu t  r i x . 

!'ht!i!  ar« 

<  ’  , 

■  : !.  A  •-  t-  -'l»t 

a  x  n 

the 

product  C  =  AB  using 

Spec 

l  ra . 

1.  Vectorial  Method.  The  element  c.  ,  in  the  matrix  C  is  obtained 

-  -Lj 

as  the  dot  product  of  the  ir.h  row  vector  of  A  with  the  jth  column 
vector  of  B.  Thus  we  have  tn  form  the  inverse  spectrum  of  the  ith 
row  vector  and  to  multiply  it  consequently  with  the  ordinary  spectra 
of  the  vectors  of  the  first,  second,  ....  nth  column.  The  middle  part 
of  such  products  will  give  us  all  the  elements  of  the  ith  row  of  the 
resulting  matrix.  The  indicatory  number  is  6  =  4  .  and  the  rough 
rhythm  is  h  =  (a)  +  (B)  +  1,  where  a  and  B  are  the  same  as  defined  above. 

2.  First  Matrix  Method.  After  determining  the  rhythm  h,  we  take 
H  =  (2n  -  1)  h  and  form  the  ordinary  double  rhythm  row  spectrum  of  the 
first  matrix  and  multiply  it  with  the  inverse  spectrum  of  the  jth  column 
of  B,  j  =  l,...,n.  This  nroduct  is  partitioned  into  sections  by  the 
rhythm  H  and  then  into  subsections  by  the  rhythm  h.  Thus  every  section, 
being  a  spectrum,  has  its  middle  part.  The  effective  values  of  the 
middle  parts  of  all  the  sections  are  the  elements  of  tlu-  jth  column 

of  the  product  AB. 


3.  Second  Matrix  Method.  By  this  method,  all  the  elements  of 

the  product  AB  are  obtained  by  one  multiplication.  for  this  method  we 

need  three  rhythms:  h  and  H  as  defined  previously,  and  H'  =  pH.  We 

form  the  double  rhythm  spectrum  S,  of  the  matrix  A  with  rhythms  H'  and  h. 

A 

We  also  form  the  inverse  spectra  .  of  the  column  of  B  with  rhythm  h. 

j 

Finally  we  form  the  ordinary  spectrum  Sg  of  this  sequence  of  numbers 

21 2 »  •••»  with  rhythm  il.  We  partition  S  -  into  sections  by 

the  rhythm  H'  and  then  partition  each  section  into  subsections  by  the 
rhythm  H  and  finally  partition  each  subsection  into  parts  by  the  rhythm 
h.  Each  subsection,  being  itself  a  spectrum,  has  a  middle  part.  The 
effective  values  of  these  middle  parts  give  the  elements  of  AB  arranged 
row  by  row  from  left  to  right. 


“ 

(l  0  2^ 

(l  2\ 

= 

1  B  = 

0  1 

l1  3 

i3 

The  indicatory  number  is  obtained  using  (5.1).  -  54,  h  -  r  and  H  =  10. 

The  ordinary  spectrum  of  the  matrix  A  is  S  =  1  1  00  •  02  Ou  0(J  01. !  03  j  00.  The 
inverse  spectra  of  the  first  and  second  column  vectors  are  •  =  3 1 00  j  01 , 

Z2  =  1 1  01 1  02. 

SE  =  3 1 00  j  07 | 00 1 02  03 i 09  01  03  00 
Sl2  =  1  i  01 1  04  I  02 1  04, 01 ;  04  05  j  06;  00 

/  7  4  \ 

The  resulting  matrix  is  j  j. 


5.3.  EVALUATION  OF  DETERMINANTS  AND  SOLUTION-  -a 
ALGEBRAIC  EQUATIONS  USING  SPECTRA. 

We  shall  apply  the  spectral  method  to  ralcu: 

D  of  order  n,  whose  coefficients  are  integers  or  :  ■. 
For  purposes  of  the  spectral  method,  it  is  advunt  . 
as  the  determinant  of  the  n  x  n  matrix  A  =  [a..], 
the  matrix  A  a  column  matrix  C  whose  ith  rov; 
spectrum  of  the  ith  rov;  of  A. 

Applying  the  "pivot"  method  to  the  element 
otherwise  rearrange  rows),  we  obtain 


Det  A 


(-1) 


n+1 


n  -  2 
ain 


Del 


where  the  elements  of  the  matrix  A'  are  given  by 


jk 


=  a.  a 
In  jk 


-  a.,  a.  , 
Ik  j  n 


1  , 2 , . . . ,n  i  .  i 


We  also  associate  with  the  matrix  A'  a  column  i.m 

row  element  is  S',  when  -  denotes  t.he  ablation 
l 

the  ordinary  spectrum  S' .  of  the  ith  row  of  A' . 
readily  that 


Si-1 


=  S, 


In 


s .  a  .  i  =  2,  .  .  .  , 
l  m 


Thus  to  the  reduction  oi  the  determinant  oi  i 
order  n  -  1  (i.e,,  from  det  A  to  det  A’),  1 h 
formation  of  the  column  matrix  (S^,,..,S  ) 

(-l)n+^  ~  -  t 

— — - — —  )  where  •.  is  Llie  effect  i' 

n-2  1  n 


of  the  spectrum  ,  Continuing  inductively  in  i: 

a  column  matrix  composed  of  one  number.  It  is  ■■ 
number  is  equal  to  det  A. 


It  should  be  noted  that  the  ordinary  ", 
the  reduction  of  a  determinant  of  order  n  ■_  < 

a  total  of  2n^  -  3n  +  1  operations,  whereas  t 
only  5n  -  4  operations,  the  formation  of  each 
one  operation. 


Li  LEAR 


■  ■  , :  aiit 
s'  numbers. 
.  i.hu  I) 
t c  with 
'rd  iiia  ry 


/-  0; 


-  i  (5.3) 

1  \  ! ,  whose 
t  i  <-•  s i  of 

;  it  follows 

(3.4) 

•d  ei  it. inant  of 
•  ;  i  ans- 

">  i  ion 

.i  in 

I'C  lil'l  i  1  -  1  , 

'  .  aui res 
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Example «  Evaluate  using  spectra: 


i  4  2-1 


5  -2  -3 


The  indicatory  number  for  forming  spectra  is  u’. wnurc 

a  =  maxi  a  |  ,  Note  that  the  rhythm  (5)  is  compa  t  i  l>  i .  with  all  the 


spectra  occuring  in  the  calculations  of  rite  i  >  ur<i’ 

Using  the  rhythm  h  =  4,  we  obtain 


2  000310001  0002 


4  0001  9999  0003 


4  9997  9997! 0001 


2  0001  0003: 0002 


The  first  transformation  leads  to  the  f  n  i 


1  9994  9995 


!  7  9992  I  999  3 


[  0 1  0003 1  9996 


Continuing  we  obtain  det  A  =  -52. 


■oduct  i  mis . 


Numerical  spectra  can  be  used  advar!.  ■ 
direct  and  iterative  methods  fur  solving  a  • 
equations  (see,  for  instance,  [3],  [4],  ,r2Jj, 
accrues  from  the  fact  that  these  methods  usi 
(dot  product,  addition  and  multiplication 
tions,  evaluation  of  determinants,  etc.)  v'.i! 
previous  sections.  Thus  the  spectral  me  mod 
in  the  Gauss-Jordan  elimination  procedure,  t  !■ 
malization  procedure,  etc.  The  deterrnin.  ‘  •  r 
does  not  present  any  difficulties. 


!■ ■  Leu  t  i  on 

;  •  i  L  *  v  *  li.  . 

:  >mb  i  na- 

■ 1  in  L h c- 

i  :  i  1.  t  M  ill-’ 

r  tbn.nor- 
1  •  v  Huns 


6.  APPLICATION  OF  NUMERICAL  SPEC'iRA  m 
EQUATIONS  -  GRAEFFE'S  METHOD 


<  i  A  i 


The  well-known  Graeffe’s  method  ([3j,  { 


;  tile 
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last  reference  contains  an  extensive  bibliography)  can  be  simplified 
by  means  of  numerical  spectra.  The  simplification  can  be  made  in  the 
first  part  of  Graeffe's  method,  that  is,  in  transformation  of  the  poly¬ 
nomial  equation.  Two  specLral  methods  are  given  in  the  present  paper. 

6.1.  FIRST  SPECTRAL  METHOD. 

The  first  step  in  Graeffe's  method  is  to  transform  Die  polynomial 

P(x)  of  the  equation 

n 

P  (x)  -  )  a  .  xn  x  =  0 
1=0  1 

into  the  polynomial  P^(x)  having  as  zeros  the  squares  oi  zeros  of  the 

previous  polynomial  P(x).  The  spectra  to  be  used  are  tin  following: 
the  ordinary  decreasing  spectrum  S  of  the  polynomial  l’(x;  which  Is 
given  by 

S  =  P(10h)  (G.U 

with  the  compatible  rhythm  h,  and  the  corrected  spectra  S  on  a  1  I  the 
even  places  of  the  same  polynomial  P(x).  This  spectrum  5  is  in  fact 
the  ordinary  decreasing  spectrum  of  the  corrected  polynomial  P(x)  on 
all  the  even  places  of  P(x):  P(x)  =  (-l)nPf-x).  Thus 

S  =  P(10h)  =  (-1)°  P(-1.0'‘)  (6.2) 


We  now  state  the  fundamental  theorem  in  the  spectral  processes  of  the 
transformation  of  the  equations. 


Theorem  6.1.  Let  S  be  the  ordinary  decreasing  spectrum  of  the  poly¬ 
nomial  P(x) ,  S  be  the  corrected  decreasing  spectrum  of  the  same  polynomial, 
both  spectra  being  formed  with  rhythm  h,  and  let  S  be  the  ordinary 
decreasing  spectrum  of  Graeffe's  transform,  with  rhythm  2h.  Then 


The  rhythm  h  is  given  by  the  formula 


(h.J) 


h  =  (log  a  (n+1) ]  +  1 


where  a  =  max  j a , j 
x  * 

a  =  max  ! a .  10  ! 

1  l 


in  case  ail  tin 
in  case  some  of 


.  oei  i  u.  r.  its  a  . 
-ho  C<  etl  i  C  ’  QV.  f 


(r.  is  the  number  of  decimals  of  a.).  The  point  rhvthr.. 

i  i 

case  is  given  by  p  =  [log  b]  +  1,  where  b  =  max  a.  •  10 


(o...  ) 


:  toners  and 
«  -  .mi  nur.  >-rs 


r>  ! :  •.  e  latter 
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Proof :  It  is  known  that  the  relation  between  Graeffe's  transform- 

polynomial  P  (x)  and  the  polynomial  P(x)  is 

P^x2)  =  (-1)1  P(x)  P(-x). 

Let  h  be  a  positive  integer  such  that  the  rhythm  h  is  compatible  with 
P(x)  and  the  rhythm  2h  is  compatible  with  P^(x).  Putting  x  =  10*1  in 

the  above  relation  and  using  (6.1)  and  (6.2)  we  obtain  =  SS.  This 
proves  (6.3). 

Now  we  prove  that  the  value  h  given  by  (6.4)  is  a  compatible  rhythm. 
For  a  rhythm  h  to  be  compatible  with  the  polynomial  P(x),  it  is 
sufficient  that  h  satisfies  the  inequality 

h  >  log  a  +  log  2  (6.5) 


For  the  rhythm  2h  to  be  compatible  with  the  polynomial  P^(x),  it 
sufficient  that  h  satisfies  the  inequality  10n  >  2(n+l)  which 
to  the  inequality 


is 

leads 


h  > 


log 


a  +  —  log  2 (n+1 ) 


(6.6) 


It  is  obvious  that  h  given  by  (6.4)  satisfies  (6.5)  and  (6.6).  Thus 
the  numbers  h  and  2h  are  rhythms  compatible  with  P(x)  and  P^(x)  respec¬ 
tively,  The  proof  of  the  validity  of  formula  (6.4)  for  decimal  numbers 
and  the  point  rhythm  requires  only  a  minor  modification  of  the  above 
proof.  The  proof  of  the  theorem  is  completed. 


Remark.  Graeffe’s  method  requires  the  successive  formation  of  a  sequence 
of  polynomials  P(x),  P^(x).,  P2(x),  ...,  P^(x),  where  each  polynomial 

P.(x)  (i  =  l,..,,k)  is  the  Graeffe  transform  of  the  previous  one.  Once 

the  last  P^(x)  has  its  zeros  separated  enough  for  the  required  accuracy 
the  transformation  is  finished.  All  the  consecutive  Graeffe  transforms 
P^ (x) , . . . ,P^(x)  will  have  the  following  compatible  rhythms  2h,...,2n  if 

h  is  obtained  from  (6.4), 


We  shall  now  explain  a  procedure  for  forming  the  corresponding 
sequence  of  spectra 

Slt  S2 . SR  (6.7) 

without  the  formation  of  any  intermediary  polynomial  P.(x),  We  first 
form  the  spectra  S  and  3  directly  from  the  given  polynomial  P(x).  The 
ordinary  decreasing  spectrum  of  the  polynomial  P^(x)  in  order  to 
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obtain  and  then  to  continue  the  formation  of  the  sequence  (6.7). 

This  can  be  done  in  the  following  way.  We  find  the  effective  values  of 
all  even  sections  of  the  spectrum  S’ .  Afterwards  we  subtract  from 

twice  these  effective  values  on  the  corresponding  places.  This  gives 
the  spectrum  3^  and  we  can  continue  the  spectral  process  until  is 

obtained.  From  we  obtain  the  polynomial  P^(x)  and  proceed  then  in 

the  classical  way. 

Example :  We  illustrate  the  method  by  making  two  consecutive  Graeffe's 
transformations  for  the  equation 

+  3x^  -  2x^  4-  5x^  +  4x^  -  3x  +  1  =  0 

Here  h  -  2,  5  «  1 i 02 } 98  j  05  j  03 ! 97 1 01 

S  can  be  obtained  directly  from  P(x),  but  we  shall  obtain_it  from  S. 

The  effective  values  of  the  even  sections  are  3, 5, -3,  so  S  is  obtained 
in  the  following  way 

S  =  1 1  02 ! 98 j  05 i 03 1 97  I  01 
-6  -10  +6 

S  -  96 | 97 | 95! 04 | 03 | 01 

S1  =  SS  =  998fe|  9981 1  9979  ,' 0041 1  9999 1  0001 

Since  a  spectrum  can  never  begin  with  a  big  section,  the  first  section  in 
must  be  preceded  by  a  section  of  nominal  value  0.  Another  way  to 

explain  the  necessity  for  this  revision  of  is  that  S^,  being  the 

spectrum  of  a  polynomial  of  the  sixth  degree,  must  have  seven  sections. 

It  is  obvious  that  the  rhythm  is  doubled  with  every  Graeffe’s 
transform,  which  is  inconvenient.  But  we  have  the  operation  of  conden¬ 
sation  of  spectra,  which  can  be  performed  if  each  section  of  the  spectrum 
begins  with  p  digits  of  0  or  p  digits  of  9.  The  rhythm  in  this  case  can 
be  condensed  by  p~l  units,  or  possibly  p  units  (see  section  2.3).  In 
the  example  above,  the  condensation  can  be  made  to  the  rhythm  2, 

The  spectrum  is  obtained  from  in  the  same  way  as  S  is  obtained 
from  S„  This  gives 

S1  =  l| 12  j 82  j  42 1 01 j  01 

S2  =  Sl^l  =  °i 9794 | 986! | 8023| 1686 | 0083 1  0001 
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From  we  obtain  the  polynomial 

P2(x)  =  x6  -  205x5  -  138x4  -  1997x3  +  1686x2  +  83x  +  1 

v 

We  remark  that  usually  the  rhythm  for  is  less  than  2  h. 

6,2,  SECOND  SPECTRAL  METHOD,  The  number  of  operations  which  art- 
needed  to  perform  one  Graeffe's  transformation  increase  with  n^.  The 
same  number  for  the  first  spectral  methjd  increases  with  n.  For  the 
spectral  method  to  be  introduced  in  this  section,  the  number  of  open' 
tions  is  constant,  eleven  operations  only  independent  of  the  value  ot 
n.  This  fact  is  of  theoretical  interest  and  also  provides  a  possibi.) 
to  use  simpler  programs  for  computers. 

The  method  is  based  on  the  operations  of  choice  applied  to  the 
spectra.  The  corresponding  operations  of  choice  applied  to  the  poly¬ 
nomial  P(x)  may  be  recalled  from  Section  2.3. 

Theorem  6,2,  The  decreasing  ordinary  spectrum  of  the  Graefnc 
transform  Pj.(x)  with  the  rhythm  2h  is  given  by  the  formula 

S,  =  (S1  +  S2)  (S1  -  S9)  (6.8) 

where  and  3^  are  respectively  the  decreasing  ordinary  odd  spectrum 

and  even  spectrum  of  the  polynomial  P(x)  with  the  rhythm  h.  The 
number  h  is  the  same  as  in  Theorem  6.1. 

Proof:  We  start  from  the  relation  P(x2)  =  (-l)nP (x)P (-x) . 

From  P(x)  =  P^(x)  +  P^(x'*  and  P(-x)  -  (-l)n[p^(x)~P2(x) ] ,  we  obtain 

PI(x2)  -[(P1(x)  +  P2(x)l  [P1(x)-P2(x>] 

Putting  x  =  10h  we  obtain 

sj  -  (s1  +  s2)  (s1  -  s2). 

The  proof  of  the  compatability  of  the  rhythms  h  and  2h  was  given  in 
Theorem  6,1, 


The  spectral  process  of  forming  the  sequence  of  spectra 


SI»  SII»  SIII’  * 


(6.9) 
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of  Graeffe's  transforms  is  as  follows.  First  we  form  by  the  operations 
of  choice  the  spectra  and  from  the  given  polynomial  P(x),  The 

spectrum  S ^  of  the  first  Graeffe  transform  is  obtained  from  (6.8). 

Then  by+the  operation  of  internal  or  formal  rounding  off,  we  obtain  the 

value  S  .  Then  S~  »  S*  -  S  ,  Applying  four  operations  of  choice  we 
f  ^  _  1  L  _ 

obtain  from  and  the  following  four  spectra  S^,  S^,  ^11*  ^12*  ^*ien 
by  two  subtractions  we  obtain 


U 


=  S 


U 


-  s 


ir 


j  =  1,2. 


Finally  by  one  addition,  one  subtraction  and  one  multiplication  of 
spectra  we  obtain 


II 


(S 


11 


+ 


sT,)<s 


LI 


-  si2). 


Thus  by  means  of  tl:  eleven  operations  mentioned  above,  we  have  obtained 
the  spectrum  S  ^  fr<  ■.  the  spectrum  S ^ .  The  method  can  be  continued 
in  the  same  way. 


Example :  We  consider  the  same  example  which  was  solved  by  the  first 

spectral  method. +  and  are  obtained  directly  from  the  polynomial, 
or  by  means  of  S.  ,  S. ,  j  =  1,2  which  have  the  following  values 


S*  =  1 1 00 f  00 1 00 1  04 i 00 i 03 
S*  =  3 [ 00| 05 | 00| 00l 00 

Sx  -  99 1 98 1  00 ! 04 ! 00 j  01 

S2  -  3| 00 1 04 : 99 : 9 7 ' 00 

S,+  S.,  =  1 1  02 ;  98!  05  03  9  7  u> 

Sl”S2  =  96  f  97 | 95 i 04 , 0J  01 

ST  -  0(  9986 1 9981 j  9979  j 0041 ! 9999  1  0001 


S*  =  2 | 00 1 00 ! 00  00 
S*  -  3.:  00 


By  condensation  to  the  rhythm  2,  we  obtain 


S  =  0| 86 i 81 ; 79! 41 ; 99: 01 
Applying  internal  round  off  we  have 
S*  =  1 1  00 1  00 i 00 i 42 ; 00 i 01 
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SI  SI  '  SI 


Finally  we  get 


=  0  13  18  21 ; 00 1 01 j  00 
=  1 00. 00  00 ’ 42 ! 00 | 01 
=  0 
=  18! 00! 00| 00t  00 

=  13  j  00 1  21 j  00 1 01 1 00 
=  0| 99 : 82 | 00| 42 ! 00| 01 

-  -13 ! 00 1  21 ! 00 1 01 1  00 

»  019794 |9861 1  8023 j 1686 j 0083j 0001 


TABLE  1 

Comparison  of  the  Number  of  Operations 


The  table  shows  the  great  difference  in  the  number  of  required 
operations  between  the  classical  .ethod  and  the  spectral  methods.  The 
difference  between  the  first  and  second  spectral  methods  is  not  great. 
For  n  <  6  the  first  spectral  method  has  a  slight  advantage  over  the 
second.  However,  even  for  small  n,  the  simplicity  of  the  program  for 
computers  puts  the  second  method  on  the  first  preference. 


7.  USE  OF  SPECTRA  IN  RECURSIVE  CALCULATIONS.  We  have  dealt  so  far 
with  fixed  problems,  that  is  with  problems  whose  data  were  known  from 
the  outset.  In  contrast,  calculations  using  recurrence  relations 
involve  results  that  enter  as  new  data  at  each  stage  of  such  calculations. 
For  the  spectral  method  this  means  that  the  starting  spectra  will  not 
be  static.  As  an  example  of  the  application  of  numerical  spectra  to 
computations  involving  recurrence  relations  we  consider  difference 
equations. 


7,1,  APPLICATIONS  OF  NUMERICAL  SPECTRA  TO  DIFFERENCE  EQUATIONS. 
Consider  the  second  order  difference  equation 


yi  =  ayi-l  +  byi-lyi-2  +  cyi-2  +  dyi-l  +  eyi-2  +  nhXi 


The  main  step  in  the  application  of  the  spectral  method  to  the 
above  equation  is  to  form,  from  the  quantities  entering  in  the  equations, 
two  sequences  so  that  the  middle  part  of  the  product  of  their  spectra 
will  correspond  to  the  right  side  of  the  above  equation.  An  example 
of  such  sequences  is 


(i)  !»  yi_2*  d*  e*  n^xi»  ^ 


(II)  0,  1, 


yl-2*  yl-l>  Cyi-2'  byt-2* 


It  is  obvious  that  this  pair  of  sequences  is  not  unique.  After  finding 
the  appropriate  rhythm,  we  form  the  ordinary  spectra  S^.  and  of  the 

sequences  (I)  and  (II).  Then  it  follows  that 
yl  -  M(SISU> 

The  advantage  of  the  spectral  calculations  is  again  evident:  the  number 
of  operations  is  considerably  less  than  in  the  classical  method.  In 

this  case  instead  of  10  multiplications  and  5  additions,  there  is  just, 

one  multiplication  of  spectra,  4  multiplications  with  constants  (a,b,i  ,nh) 
and  obviously  the  extraction  of  the  middle  part. 

If  the  spectra  art-  too  long,  it  is  possible  to  divide  the  problem 

into  two  or  more  parts.  For  example  we  can  multiply  the  spectra  of 

the  following  two  pairs  of  sequences 


(I’) 

yl-l'  yi-l*  yi-2 

(II1) 

cyi-J>  byi-l’  oy 

and 
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(I") 

(II") 


d ,  e ,  n  h  x  . 

’  1 


We  then  add  the  obtained  spectra  and  extract  the  middle  part. 

For  desk  calculations  it  is  better  to  arrange  the  sequences  so  that 
one  of  them  will  be  composed  of  constant  quantities  only.  Thus  for 
(I")  and  (II")  it  is  better  to  arrange  in  the  form 

d,  e,  1 


n  h  x.,  y._2,  yi_1. 

To  demonstrate  the  application  of  spectra  of  decimal  numbers  we 
shall  solve  the  following  linear  nonhomogeneous  difference  equation 
of  the  second  order 


=  O.Sy^  +  0.5y1_2  ”  0 . •♦  ( 2  L  - 1 ) 

subject  to  the  initial  conditions  yQ  =  -0.2,  y,  -  0.3.  The  above 
difference  equation  can  be  represented  by  the  following  <io  sequences 


0.5 


0.5 


-0.4 


0.1 (2i-l) 


y 


i-2 


v 


l-l 


The  numbers  in  this  problem  are  decimal  numbers.  Thus  we  shall 
form  D-spectra.  In  the  case  of  integers ,  the  rhythm  would  be  h  =  [log  2n  ab]  +1 
where  n  is  the  number  of  terms  in  anv  sequence,  and  a  -  maxlu.i,  b  =  maxlb.j 

l  i 

(a^  and  b^  are  the  terms  of  the  first  and  second  sequences  respectively). 

One  of  the  purposes  of  this  example  is  to  show  how  to  obtain  the  main 
rhythm  h  and  the  point  rhythm  p  for  the  D-spectrum.  Let  p ^  denote  the 

number  of  decimals  in  a.  and  q.  the  number  of  decimals?  in  I)..  Then  the 

point  rhythm  p  is  given  by  p  -  max  (p  ,  q .  .) .  The  main  rhythm  is  obtained 

from  the  same  formula,  where  the  quantities  <t  and  b  are  replaced  by 

p .  q . 

a  =  max  i|a.j  >  10  1,  b  -  max  t|b.  ;  10  1 

l  ’  'i 

The  spectrum  of  the  first  sequence  is:  S.  s,  .  5 1  0 .  i 9  b,  the.  second 

spectrum  is  S2  =  .  2  j  9 , 8 !  0 . 3  and  the  produc  ci  spectra  i ?  F  =  1504932288. 

The  point  rhythms  are  to  be  added  since  both  sn'.tra  are  P-spectra,  Thus 
the  point  spectrum  of  the  product,  is  p  =  2,  The  main  rhythm  remains  the 
same.  Thus  the  partitioning  of  the  resulting  spectrum  gives 
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» 


* 


. 1 5 | . 04  j .93! .22| 

and 

y 2  -  m^1s2)  -o.07. 

We  remark  that  if  it  is  necessary  to  continue  the  calculations,  we 
have  a  choice  of  one  of  two  procedures: 

(a)  Take  from  the  beginning  the  main  rhythm  h  large  enough  to 
obtain  y^,  . ..,  y^  (where  k  is  fixed  in  advance.) 

(b)  Instruct  the  machine  to  dilute  automatically  the  spectra  to 
the  greater  rhythms  which  will  be  compatible  with  the  next  calculation. 

The  first  procedure  is  ordinarily  better  if  k  is  a  small  number. 

It  is  possible  to  use  spectra  for  solving  difference  equations  even 
when  the  data  are  approximate  numbers  (interval  numbers).  Methods  of 
solving  difference  equations  having  interval  numbers,  without  the  use 
of  spectra  are  given  for  instance  in  [12]  and  [6], 

7.2.  APPLICATION  OF  dl’ECTRA  TO  BERNOULLI'S  METHOD .  As  another 
example  of  the  use  of  numerical  spectra  in  recursive  calculations, 
we  consider  Bernoulli's  method  for  solving  polynomial  equations.  We 
write  the  polynomial  equation  in  the  form 


Now  we  determine  the  rhythm.  If  all  the  b^  are  integers,  the 
rhythm  valid  for  p  consecutive  calculations  is 

h  =  [log  2(m  u  v)^]  +1 

where  m  is  the  number  of  different  from  zero  and  u  =  maxjb^J, 
v  =  max!'^' ,  The  modification  to  noninteger  is  simple. 

It  is  obvious  that 

-k  '  M<S1  S2> 

where  and  are  the  spectra  of  the  sequences  (I)  and  (II)  respective-. J 

8.  PSEUDOSPECTRA  WITH  APPLICATIONS  TO  DIFFERENTIAL  EQUATIONS , 

8.1.  PSEUDOSPECTRA. 

If  a  positive  integer  h  is  too  small  to  be  compatible  with  the  poly¬ 
nomial  or  with  the  function,  then  it  is  impossible  to  form  the  spectrum 
even  though  it  is  possible  to  form  by  means  of  the  formulae  (2.6)  —  (2.9) 
the  numbers  S.  But  these  numbers  are  surely  not  spectra.  Such  numbers 
are  called  pseudospectra.  In  the  present  section  we  establish  the 
relevance  of  pseudospectra  to  numerical  spectral  analysis.  In  spectra, 
all  the  digits  of  a  certain  coefficient  of  the.  polynomial  are  not 

interlaced  with  the  digits  of  the  other  coefficients  of  the  polynomial. 

In  the  proposed  pseudospectra  such  a  mixing  will  occur.  The  first  and 
key  step  in  any  spectral  calculation  is  the  determination  of  the  rhythm 
h  compatible  with  the  problem  and  the  given  data.  This  rhythm  is 
determined  using  major izations  which  sometimes  lead  to  a  very  large  h. 

The  result  of  the  calculations  sometimes  indicates  that  such  calculations 
could  have  been  performed  with  a  smaller  rlivth:  .  Also,  in  some  problems 
ot  analysis,  the  determination  of  the  rhythm  becomes  impractical  due  to 
the  complexity  of  the  quantities  to  be.  majorized.  The  pseudospet  tra 
enables  us  to  carry  on  spectral  analysis  without  an  apriort  knowledge 
ot  the  rhythm.  We  now  present  the  basic  idea  underlying  pseudwspeci.ra  i 
processes . 

The  pseudospec tra 1  method  starts  with  two  natural  numbers  h  ,  li 
(h„  •  h  ).  Using  the  fixed  number  h  and  the  appropriate  formulae 

1  1  (1)  1  (i)  (i) 

(2.6)— (2.9) •  we  obtain  numbers  8^  ,  ‘ ,  .  The  superscript 

is  intended  to  indicate  that  the  computations  are.  carried  using  1. 1 1 - ■ 
rhythm  h^,  and  the  subscripts  refer  to  different  elements  of  the  Ja1  j. 

By  performing  the  required  operations  (such  operations  were  deserine: 

abstractly  in  (3.2))  we  obtain  a  number  that  can  be  a  spectrum  or 

a  pseudospectrum.  We  proceed  in  the  same  manner  with  hv,  obtaining  a 
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f 


(2) 

resulting  number  S  ,  By  comparing  these  two  numbers  using  the  comparison 

theorem  to  be  given  below  we  can  determine  if  is  a  spectrum.  If 

is  a  spectrum,  we  are  done.  If  it  is  not  a  spectrum,  we  take  another 

(3) 

natural  number  h»  >  h0,  form  S  and  repeat  the  same  comparison,  now 

(2)  (j)  1 

with  S  and  S  .  This  process  can  be  carried  on  until,  by  means  of 
(k-1)  (k)  (k-1) 

comparison  of  S  and  S  ,  S  is  found  to  be  a  spectrum. 


Now  we  come  to  the  comparison  theorem  alluded  to  above,  which  can 
be  established  without  much  difficulty. 


Comparison  Theorem. 


A  necessary  condition  for  and  to  be 

,(D 


spectra  is  that  the  effective  value  of  each  section  of  S  is  equal  to 

(2) 

the  effective  value  of  the  corresponding  section  of  S  .  This  condition 
is  also  sufficient  if  the  unknowns  of  the  problems  to  he  found  by  these 
spectra  are  all  positive  numbers  (+) , 


Corollary.  If  not  all  effective  values  of  two  pseudospectra  but  only 
the  effective  values  of  the  first  k  sections  are  equal,  then  the 
pseudospectra  can  be  considered  as  spectra  in  these  sections  only. 


Remark:  It  may  appear  that  the  calculations  of  the  numbers 

g(l),  g^),  ...,  require  more  operations  than  the  calcu¬ 

lations  of  one  spectrum  using  a  rhythm  h  obtained  by  majorization  estimates. 
However,  this  is  not  necessarily  true  if  the  rhythm  h  is  much  larger 

than  h. 

k. 


8.2.  A  SPECTRAL  METHOD  FOR  SOLUTION  OK  ORDINARY  DIFFERENTIAL 
EQUATIONS  OF  THE  FIRST  ORDER. 

In  this  section  we  develop  spectral  and  pseudospectral  methods  to 
solve  the  initial  value  problem 

y'  =  f(x,y),  y(xQ)  =  yQ  (8.1) 

We  assume  that  the  function  f(x,y)  is  expandable  in  Taylor  series  at 

<v  yo)*  i,e” 

f(x,y)  -  I  >  *„„<*-*„>"  (y-,o>” 

_ _ n=0  m=0 

(+)If  this  is  not  the  case,  then  additional  conditions  are  required 
for  the  sufficiency.  The  formulation  of  these  conditions  car.  only  be 
given  for  specific  classes  of  problems. 


2b3 


The  basis  for  the  application  of  the  pseudospectral  method  to  this 
differential  equation  is  a  modified  Picard's  method.  The  modification 
was  made  by  one  of  the  authors  of  this  paper  [13],  Because  this  modi¬ 
fication  is  not  widely  known,  we  explain  it  briefly.  The  method 
consists  of  finding  a  sequence  of  polynomials  y^,  y ...  by  means  of 

the  iterative  scheme 

x 

yi(x)  =  yo  +  |  T±  f (x,yi_1(x))dx 
x 

0 


where  the  operator  T^  is  the  Taylor  polynomial  of  degree  i-1  associated 

with  f(x,y.  n)  at  x  . 

i-l  o 

f (x,yi-l)  =  aoX)  +  ail)  (x-x0)  +  •••  +  (x-Xo)1_1 


It  was  shown  in  [13]  rhat  a^  is  independent  of  i  for  each  fixed  k, 
so  the  ith  approximation  consists  of  adding  one  new  terms  of  the  form 

f  ai_1(x-Xo)i  ^dx.  Furthermore,  the  solution  of  (8.1)  (cf.  [ 1 3 ] ) is 


given  by 


yo  +  £  0+1)1 


(x  -  X  )- 
o 


(8.2) 


For  application  of  spectral  or  pseudospectral  methods,  it  is 
necessary  to  know  in  advance  that  the  coefficients  are  integers  or 

finite  decimal  numbers.  For  simplicity  of  exposition,  we  assume  that 
all  a^  are  integers;  the  modification  is  similar  to  previous  modificati. 

elaborated  in  this  paper.  These  conditions  are  satisfied  by  a  large 
class  of  functions,  for  instance,  the  rational  function 


f(x,y)  = 


Q(x,y) 


(8.3) 


satisfies  these  conditions  if  the  coefficients  of  the  polynomials  P 
and  Q  are  integers  and  Q(xQ,yo)  =  1. 

In  the  iteration  method  exposed  above  there  occur  three  kinds  of 
functions.  We  construct  from  them  the  following  table 
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Tif(x,yo) 


(8.4) 


f(x,yo) 

f (x,yL) 


T0f(x,y  ) 


y , 
yi 
y2 


The  elements  of  the  first  column  are  expandable  in  Taylor  series,  the 
elements  of  the  second  and  third  columns  are  polynomials.  From  the 
table  (8,4),  composed  of  functions,  we  form  the  table  of  corresponding 
numbers,  spectra  or  pseudospectra,  which  we  shall  write  in  the 
following  way: 


S 


1 


S 


2 


So 


s 

o 


S2  (8.5) 


We  note  that  the  s  ,  cannot  be  spectra  because  the  functions  y  in  gene 
have  coefficients  which  are  not  finite  decimal  numbers.  Thus  the  s, 
are  pseudospectra.  From  the  relation 

=  fdO'h,  s._])  (8.6) 

It  follows  that  the  8.  must  also  be  pseudospectra.  The  terms  of  the 

second  column  are  obviously  obtained  from  the  corresponding  terms  of 
the  first  column  of  the  same  table  (8.5)  by  rounding  off  and  conse¬ 
quently  are  designated  by  the  symbol  ~  having  the  meaning  of  ablatb  a 
of  decimals. 


Now  suppose  we  want  to  find  the  first  p  terms  of  the  series  (8.2; 
This  means  that  we  have  to  obtain  the  (p-l)1-^  successive  approximate  on 


y 


p-2  a . 


p-i  ^0(j+1)!  (x’Xo) 


i+i 


(8.2; 
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But  instead  of  finding  y^_^  we  shall  find  the  polynomial  (p— 1 ) ! ^ 

since  the  coefficients  of  the  latter  polynomial  are  all  integers. 

We  modify  tables  (8.4)  and  (8.5)  accordingly.  The  modification  of 
^.S)  is  given  below,  where  we  have  also  augmented  the  table  by  a 
new  column  of  integers  a.. 


(8.8) 


•  •  •  • 

Here  S  j  =  ( i— 1 ) !  and  S!^  is  the  value  of  Sl^  rounded  off  to  (i-l)h 
decimals . 


Now  it  remains  only  to  find  the  relations  by  means  of  which  the 
recursive  calculations  of  the  table  (8.8)  can  be  performed  in  a  purely 
arithmetic  way,  row  by  row.  It  is  easy  to  sh  that  these  relations  are 

S'  =  (i-1)'  f (10_hs  ,) 


ai  -  ioll’<5;+i  - 1  sp 


(8.9) 


10 

si+i  =  si.  +  7i+TT!  (sl+i  "  s[)>  1  = 


The  last  relation  is  obtained  from 


ai  ,  s i+L 

•vi+i  ■  \  +  o+nr  (*'V 


The  initial  values  necessary  to  start  the  iterative  processes  are 


s  =  y  ,  s,  =  s  +  10  ^  S.',  a  =  S’ 

O  '  O  1  O  1*0  1 


(8.10) 


We  remark  that  the  numbers  S  j  need  only  be  calculated  to  the  accuracy 
of  (i-l)h  decimals;  this  approximate  value  is  just  S^. 

The  calculations  by  means  of  formulae  (8.9)  can  be  performed  using 

spectra  or  pseudospectra.  For  spectral  processes  it  is  necessary  to 

have  some  knowledge  about  the  coefficients  A  of  the  Taylor  series 

ran 
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expansion  of  the  function  f(x,y),  Also  the  rhythm  for  spectra  which  may 
be  obtained  is  ordinarily  very  large.  For  these  two  reasons  we  shall 
adopt  pseudospe ctral  processes  in  which  we  do  not  need  to  know  the 
coefficients  A  and  we  can  operate  with  smaller  rhythms.  The  formulae 

for  pseudospec.tral  calculations  are  the  same,  but  every  number  S ’  shall 

be  calculated  at  least  in  two  ways,  by  means  of  two  arbitrary  rhvthms 
h.  and  h_  (h0  >  h.. ) .  If  the  comparison  of  these  two  numbers 

i  (2)  6  6  i  i 

leads  to  the  same  value  of  then  we  pass  to  the  calculation 

of  the  next  row.  If  the  comparison  leads  to  different  we  take 

greater  rhythms  hj  and  h^  and  repeat  the  calculation.  The  chance  of 

making  a  mistake  (because  the  conditions  of  the  general  comparison  theurer 
are  not  sufficient)  is  very  small.  Even  if  it  occurs,  it  is  detected 
by  checking  up  of  the  final  result,  that  is  of  the  approximate  solution 
of  the  differential  equation. 


Example: 


l-x+\ 


.  2 
1-x  y 


*0  -  0,  yD  =  0 


It  is  necessary  to  calculate  the  coefficients  a  ,  i  =  0,1, 2, 3, 4.  We 
take  In  =  1,  h  =  2.  Table  (8.8)  (with  the  f,rst  column  omitted)  is 


~  J.  j  U  2 

in  this  case 


s'(1)  =  1 
s^(2)  =  1 
s-(1)  =  1.0 
s^(2)  =  1.00 
s'(1)  =  2.00 


2.(2)  _ 

3 


=  2.0000 


=  5.994 


S.(2) 

b4 


5.99999 


oi(D  =  23.9754 
5 

Cl (2)  QQOQ- 


23.99997594 


a  *  1 
o 


a!  =  ° 
a2  =  0 


a^  =  -6 


a,  -  -6 
4 


s  =  0 

=  0fl 

sp}  =  0.01 

s21}  =  0.10 
(2)  =  0.0100 


=  0.100 
=  0.010000 


s)  =  0.099975 

4 

s<2)  -  0 . 00999°9« 7 5 
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4  5 

X  X 

Thus  we  obtain  y  *  x  -  ^ —  +  rrrr  + 

4  20 

The  spectral  and  pseudospectral  methods  can  be  used  for  differential 
equations  of  higher  order  and  for  systems  of  differential  equations. 
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